Publication Details

Extrakce informace z WWW na základě znalosti struktury dat

BURGET Radek. Extrakce informace z WWW na základě znalosti struktury dat. In: Sborník příspěvků 2. ročníku konference Znalosti 2003. Ostrava: Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava, 2003, pp. 271-280. ISBN 80-248-0229-5.
English title
Information Extraction from WWW based on the data structure knowledge
Type
conference paper
Language
czech
Authors
Keywords

Information Extraction, HTML, XML

Abstract

This paper deals with the matter of modelling the logical structure of a Web site and using such model for information extraction. It proposes an algorithm for creating a site model based on the HTML code analysis and a XML/XSL based system for information extraction from this model. Furthermore, the possibility of the usage of tree matching algorithms for automating the extraction process is discussed.

Published
2003
Pages
271-280
Proceedings
Sborník příspěvků 2. ročníku konference Znalosti 2003
Conference
Znalosti 2003, Ostrava, CZ
ISBN
80-248-0229-5
Publisher
Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava
Place
Ostrava, CZ
BibTeX
@INPROCEEDINGS{FITPUB7136,
   author = "Radek Burget",
   title = "Extrakce informace z WWW na z\'{a}klad\v{e} znalosti struktury dat",
   pages = "271--280",
   booktitle = "Sborn\'{i}k p\v{r}\'{i}sp\v{e}vk\r{u} 2. ro\v{c}n\'{i}ku konference Znalosti 2003",
   year = 2003,
   location = "Ostrava, CZ",
   publisher = "Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava",
   ISBN = "80-248-0229-5",
   language = "czech",
   url = "https://www.fit.vut.cz/research/publication/7136"
}
Back to top