Conference paper

BURGET Radek. Visual Area Classification for Article Identification in Web Documents. In: 21st International Workshop on Databases and Expert Systems Applications. Bilbao: IEEE Computer Society, 2010, pp. 171-175. ISBN 978-0-7695-4174-7.
Publication language:english
Original title:Visual Area Classification for Article Identification in Web Documents
Title (cs):Vizuální klasifikace pro identifikaci článků ve webových dokumentech
Pages:171-175
Proceedings:21st International Workshop on Databases and Expert Systems Applications
Conference:9th International Workshop on Web Semantics
Place:Bilbao, ES
Year:2010
ISBN:978-0-7695-4174-7
Publisher:IEEE Computer Society
Keywords
article extraction, document cleaning, page segmentation, visual analysis
Annotation
In the World Wide Web, the news and other articles are usually published in complex HTML documents containing many types of additional information that is not explicitly marked. In this paper, we propose a visual information analysis approach to the article discovery in complex HTML documents. We use a classification approach for the identification the important parts of the article within the page and we propose an algorithm for the detection of the article bounds within the page. Finally, we provide the results of an experimental evaluation.
BibTeX:
@INPROCEEDINGS{
   author = {Radek Burget},
   title = {Visual Area Classification for Article Identification in Web
	Documents},
   pages = {171--175},
   booktitle = {21st International Workshop on Databases and Expert Systems
	Applications},
   year = {2010},
   location = {Bilbao, ES},
   publisher = {IEEE Computer Society},
   ISBN = {978-0-7695-4174-7},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=9292}
}

Your IPv4 address: 54.163.210.170
Switch to IPv6 connection

DNSSEC [dnssec]