FITLayout Web Page Segmentation Framework

Authors:Burget Radek, Milička Martin
Licence:required - no fee
Keywords:web page segmentation, document analysis, text classification, web page rendering
FitLayout is an extensible web page segmentation framework written in Java. It defines a generic Java API for representing a rendered web page and its division to visual areas and it provides a base for implementing page segmentation algorithms with a common application interface. As a sample segmentation method, it implements a previously published segmentation algorithm based on recursive visual area merging and separator detection. The framework includes tools for post-processing the segmentation result by different text or visual classification methods. Finally, it also provides tools for controlling the segmentation process and examining the segmentation results through a graphical user interface. The segmentation result may be stored as RDF data for later analysis.
Research groups:
Licence terms:
Free software under the terms of the GNU GPL license.

