Publication Details

Extrakce informace z WWW na základě znalosti struktury dat

BURGET Radek. Extrakce informace z WWW na základě znalosti struktury dat. In: Sborník příspěvků 2. ročníku konference Znalosti 2003. Ostrava: Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava, 2003, pp. 271-280. ISBN 80-248-0229-5.

English title

Information Extraction from WWW based on the data structure knowledge

Type

conference paper

Language

czech

Authors

Burget Radek, doc. Ing., Ph.D. (DIFS FIT BUT)

Keywords

Information Extraction, HTML, XML

Abstract

This paper deals with the matter of modelling the logical structure of a Web site and using such model for information extraction. It proposes an algorithm for creating a site model based on the HTML code analysis and a XML/XSL based system for information extraction from this model. Furthermore, the possibility of the usage of tree matching algorithms for automating the extraction process is discussed.

Published

2003

Pages

271-280

Proceedings

Sborník příspěvků 2. ročníku konference Znalosti 2003

Conference

Znalosti 2003, Ostrava, CZ

ISBN

80-248-0229-5

Publisher

Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava

Place

Ostrava, CZ

BibTeX

@INPROCEEDINGS{FITPUB7136,
   author = "Radek Burget",
   title = "Extrakce informace z WWW na z\'{a}klad\v{e} znalosti struktury dat",
   pages = "271--280",
   booktitle = "Sborn\'{i}k p\v{r}\'{i}sp\v{e}vk\r{u} 2. ro\v{c}n\'{i}ku konference Znalosti 2003",
   year = 2003,
   location = "Ostrava, CZ",
   publisher = "Faculty of Electrical Engineering and Computer Science, VSB-TU Ostrava",
   ISBN = "80-248-0229-5",
   language = "czech",
   url = "https://www.fit.vut.cz/research/publication/7136"
}