FITLayout Framework Manual

Architecture of the Framework

FITLayout operates on a rendered page represented by a box tree. The box tree is obtained by rendering the page and calculating the positions, fonts, colors and other visual features of the indivudual pieces of contents (boxes). The box tree represents an input of the page segmentation algorithms.

Page segmentation is the main task implemented in FITLayout. It analyzes the input boxtree and produces a tree of visual areas that correspond to the detected visual blocks in the page. The created visual area tree may be further processed by area tree operators that represent independent post-processing steps of the segmentation. These steps may change the organization of the resulting tree of visual areas, e.g. group several nodes to new areas, etc.

The process of page rendering and segmentation may be controlled using a provided set of tools. These tools include a visual browser with a graphical user interface that can be used for configuring and executing the individual tasks. Moreover a scriptable processor is provided that allows to use JavaScript for running the tasks in batch mode.

Modules

The FitLayout framework consists of the following basic modules:

API – the basic Java interfaces and their generic implementation that define a common application interface for page segmentation methods (see below).
CSSBox bindings – a default implementation of the rendered page source based on the CSSBox rendering engine.
Segmentation – an implementation of a basic page segmentation method that may be further extended by adding custom area tree operators.
Tools – tools for controlling the segmentation process that include a graphical browser of the segmentation result.

There exist some more additional modules that will be described later.

The API (cssbox-api) module provides a shared API common for all the remaining modules. It provides the following basic Java packages:

org.fit.layout.model – basic java interfaces used for representing the rendered page (a box tree) and the result of segmentation (an area tree).
org.fit.layout.impl – default implementations of the interfaces from the model package. These implementations may be used as a starting point for further extension in applications.
org.fit.layout.api – interfaces specific for the FITLayout framework itself. They include the services of different kinds as described below.
org.fit.layout.gui – common interfaces of a GUI browser used for monitoring the page processing.

The details about the individual available interfaces are given in the appropriate sections below.

Services

The FITLayout architecture is easily extensible by creating new plugins providing new functionality such as new box tree sources (document renderers), segmentation algorithms, area tree post-processing operators or GUI extensions. The plugins use the standard Java Extensible applications framework.

The following types of services are recognized:

BoxTreeProvider – a box tree source; i.e. the page renderer. Based on the input parameters (e.g. the page URL), it renders the page and produces the box tree.
AreaTreeProvider – an area tree source; i.e. a basic segmentation algorithm. It gets a box tree on its input and produces a visual area tree that represents the segmented page.
AreaTreeOperator – a post-processing operation applied on the visual area tree. It may perform any operation with the tree such as joining nodes, splitting nodes, extending the hierarchy, etc.
LogicalTreeProvider – an analyzer that gets the final area tree on its input and assigns semantics to selected areas (tree nodes).

Each service is identified by its unique identifier obtained using its getId() method. All the services may accept some input parametres. They implement a ParametrizedOperation interface that allows to get the information about the required input parametres (their names and types) and to assign the values to them.

For accessing the services, FITLayout provides a simple ServiceManager that provides static methods for locating the services of the given types.

Box Tree

The whole rendered page is represented using a Page object. Its getRoot() method obtains a root node of the box tree that represents the page contents. The nodes of the box tree are formed by the Box objects that represent the individual rendered boxes. Each box has a fixed position in the rendered page obtained using the getBounds() method and some more visual properties such as font size, colors, etc. The related methods are defined by a shared ContentRect interface.

The getType() method obtains the box type which is one of the following:

ELEMENT – a box generated by a DOM element
TEXT_CONTENT – a box representing a displayed text
REPLACED_CONTENT – a box representing a replaced content (an image or other object)

The boxes are organized in a hierarchical structure. The getParentBox(), getChildBox() and getChildCount() methods may be used for traversing the hierarchy. The TEXT_CONTENT and REPLACED_CONTENT boxes are always the leaf nodes of the tree. The ELEMENT nodes may exist anywhere in the tree.

CSSBox Box Source

The default box source is implemented in the layout-cssbox (CSSBox bindings) module as the CSSBoxTreeProvider class. The individual boxes are represented using the BoxNode objects. The CSSBox box source renders an input document identified by its URL. It supports the HTML/CSS and PDF documents. It is based on the open-source CSSBox rendering engine.

Segmentation

The segmentation algorithm takes a box tree on its input and it produces a tree of visual areas. The resulting tree is represented by a AreaTree object. Its getRoot() methods obtains a root node of the area tree that represents the segmentation result. Each node of the area tree is represented using an Area object that corresponds to a visual area detected in the page. The root node corresponds to the whole page area, the descendant nodes correspond to smaller detected areas. The leaf areas may contain the actual boxes from the box tree that represent the contents of the area.

The nodes provide the basic tree navigation and manipulation methods similary as for the box tree. All these functions are specified by a shared AreaTreeNode interface.

The position of the area in the rendered page and all its visual features such as fonts and colors may be obtained throught the implemented ContentRect the same way as for the individual boxes. However, since the contained boxes may have different visual properties (e.g. different font sizes), the corresponding methods for the visual area (such as getFontSize()) return the average values for the whole area.

Optionally, the mutual positions of the areas within its parent area may be described by an arbitrary topology. A typical example is a gird topology that represents the area positions using a flexible grid. The position of each area in the topology may be obtained using the getTopology method and is represented using a generic AreaTopology interface.

Default Extensible Segmentation Algorithm

The default segmentation algorithm implementation is contained in the segmentation module. It works in the following steps:

The tree of basic visual areas is created. With a basic visual area, we understand the area formed by any box from the source box tree that is visually separated from its neighborhood. Generally the following boxes are considered to be visually separated:
- The root box.
- The boxes that directly contain a text.
- The boxes that have a background different from its neighborhood or have a visible border.
For each visually separated box, a corresponding area is created in the tree of basic visual areas.
The tree is processed by selected area tree operators. Severals area tree operators are available for performing common segmentation tasks such as concatenating text lines or finding larger groups or boxes. See the implementations of the AreaTreeOperator interface for a reference.

The area nodes in the processed area tree are represented using a custom AreaImpl class.

Tools

The tools module provides the tools for running and controlling the segmenation process. The Processor that implements the whole segmentation process and the BlockBrowser with the graphical user interface.

Processor

The processor is a class that is responsible for executing the complete segmentation process, i.e. for creating the tree of basic visual areas and to apply the configured operators on that tree. The basic functionality is defined in an abstract BaseProcessor class. There are two implementations available:

ScriptableProcessor uses JavaScript for configuring the area operators that should be applied.
GUIProcessor where the configuration of the operators may be modified from outside (typically by the GUI browser).

GUI Browser

The BlockBrowser implements the default browser with a Swing GUI. It lets the user to choose the box tree provider, the area tree provider and to configure the applied operators. For executing the segmentation, an instance of the GUIProcessor is used.