Conference paper

ZELENÝ Jan and BURGET Radek. Cluster-based Page Segmentation - a fast and precise method for web page pre-processing. In: The Third International Conference on Web Intelligence, Mining and Semantics. Madrid: Association for Computing Machinery, 2013, pp. 1-12. ISBN 978-1-4503-1850-1.
Publication language:english
Original title:Cluster-based Page Segmentation - a fast and precise method for web page pre-processing
Title (cs):Cluster-based Page Segmentation - rychlá a přesná metoda pro předzpracování webových stránek
Pages:1-12
Proceedings:The Third International Conference on Web Intelligence, Mining and Semantics
Conference:International Conference on Web Intelligence, Mining and Semantics
Place:Madrid, ES
Year:2013
ISBN:978-1-4503-1850-1
Publisher:Association for Computing Machinery
Files: 
+Type Name Title Size Last modified
iconjzeleny.pdf445 KB2013-01-28 23:02:24
^ Select all
With selected:
Keywords
VIPS, vision-based page segmentation, clustering, template,\\template detection
Annotation
Segmenting a web page may be one of initial steps of information retrieval or content classification performed on that page. While there has been an extensive research in this area, the approaches usually focus either on performance or quality of the results. Vision based segmentation is one of the quality focused methods, which are considerably slow. This paper proposes an approach for boosting the performance of vision based algorithms. Our approach is based on concepts of modern web and a very common scenario in which an entire web site is processed at once. In this scenario, a great amount of performance boost can be gained by isomorphic mapping of previous results gathered from pages within the site to other pages on the same site. We provide the results of experiments performed on VIPS, the most common algorithm for page segmentation.
BibTeX:
@INPROCEEDINGS{
   author = {Jan Zelen{\'{y}} and Radek Burget},
   title = {Cluster-based Page Segmentation - a fast and precise method
	for web page pre-processing},
   pages = {1--12},
   booktitle = {The Third International Conference on Web Intelligence,
	Mining and Semantics},
   year = {2013},
   location = {Madrid, ES},
   publisher = {Association for Computing Machinery},
   ISBN = {978-1-4503-1850-1},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=10252}
}

Your IPv4 address: 54.145.16.43
Switch to IPv6 connection

DNSSEC [dnssec]