There are many aspects and objectives we had in mind when designing the new NER. First, it should avoid the performance bottlenecks common for the webbased APIs such as OpenCalais or AlchemyAPI. Second, it needs to achieve excellent precision and recall for geographical features, especially for places in Europe. Finally, the tool should perform disambiguation and normalization alongside the recognition process. To meet the second objective, we utilized the Geonames.org data exported from the database, which contains over 10 millions geographical features. Efficiency is brought by the finite state automaton (FSA) technology that can deal with the huge lists of names and is very fast in searching the input texts. We employed an efficient algorithm for constructing the minimal FSA described in Daciuk et al. (1998). A freely available package provided by the first author of the paper allows building a minimal FSA from a list of pre-defined keywords. The resulting representation for all the relevant data from GeoNames (originally over 1.1 GB) takes only 71MB and the processing is extremely fast.
|
Authorized software license
Brno University of Technology, faculty of
Information Technology IČ 00216305, Božetěchova 2, 612 66 Brno, (further only FIT BUT)
is entitled to license the authorized software accessible at the "Authorized
software" page http://www.fit.vutbr.cz/research/prod/index.php?id=177 (further only authorized software).
Everyone who uses the software in any way at least once becomes the user. The
user agrees to comply to the following conditions of use.
Before first usage of the software the user
expressed his/her agreement with the following license conditions:
Authorized software - is only possible to use in compliance with
these license conditions; the user must ensure that the conditions are
fulfilled by the eventual third party who is able to access the authorized
software,
- it is not allowed to sell, rent, or otherwise
transfer the license without the permission of FIT BUT,
- is not allowed to include into other software
product and distribute the such products derived from the original authorized
software without the permission of FIT BUT, or modify the internal structure in
any other way,
- cannot be modified as the whole or any of its
parts so that the information about FIT BUT is removed,
- is not allowed to reverse analyze, decompile,
or modify in any other way,
As the license is granted for free, the
software is not covered with any guarantee (this is valid to the maximum extent
possible under the law). The user accepts the software "as it is" without any
guarantee of any kind, namely, but not limited to, the guarantee of suitability
for sales, suitability for the purpose, occurrence of flaws, functionality,
quality, performance, or continuous availability or compatibility with other
software. Brno University of Technology (to the maximum possible extent under
the law) disclaims the duty to compensate any expenses connected with exploitation
of the software now and in the future.
If any of the above conditions is violated, the
license is automatically terminated and the user must stop use the authorized
software immediately. |