FITLayout Web Page Segmentation Framework
|Authors:||Burget Radek, Milička Martin|
|Licence:||required - no fee|
|Keywords:||web page segmentation, document analysis, text classification, web page rendering|
|FitLayout is an extensible web page segmentation framework written in Java.
It defines a generic Java API for representing a rendered web page and its division to visual areas
and it provides a base for implementing page segmentation algorithms with a common application interface.
As a sample segmentation method, it implements a previously published segmentation algorithm based on
recursive visual area merging and separator detection. The framework includes tools for post-processing
the segmentation result by different text or visual classification methods. Finally, it also provides tools
for controlling the segmentation process and examining the segmentation results through a graphical user
interface. The segmentation result may be stored as RDF data for later analysis.|
|Free software under the terms of the GNU GPL license.|