Conference paper

BURGET Radek. HTML Document Analysis for Information Extraction. In: Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002, pp. 426-430. ISBN 80-214-2116-9.
Publication language:english
Original title:HTML Document Analysis for Information Extraction
Title (cs):Analýza HTML dokumentů pro extrakci informace
Pages:426-430
Proceedings:Proceedings of 8th EEICT conference
Conference:ELECTRICAL ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES 2002
Place:Brno, CZ
Year:2002
ISBN:80-214-2116-9
Publisher:Faculty of Information Technology BUT
Keywords
HTML Analysis, Information Extraction
Annotation
The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.
BibTeX:
@INPROCEEDINGS{
   author = {Radek Burget},
   title = {HTML Document Analysis for Information Extraction},
   pages = {426--430},
   booktitle = {Proceedings of 8th EEICT conference},
   year = {2002},
   location = {Brno, CZ},
   publisher = {Faculty of Information Technology BUT},
   ISBN = {80-214-2116-9},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=6921}
}

Your IPv4 address: 54.196.201.241
Switch to IPv6 connection

DNSSEC [dnssec]