BUT Recognizer of GeoNames

Authors:Otrusina Lubomír, Smrž Pavel, Sznapka Jakub, Šafář Martin
Licence:required - no fee
Keywords:name entitiy recognition, geonames.org, finite state automaton
There are many aspects and objectives we had in mind when designing the new NER. First, it should avoid the performance bottlenecks common for the webbased APIs such as OpenCalais or AlchemyAPI. Second, it needs to achieve excellent precision and recall for geographical features, especially for places in Europe. Finally, the tool should perform disambiguation and normalization alongside the recognition process.
To meet the second objective, we utilized the Geonames.org data exported from the database, which contains over 10 millions geographical features. Efficiency is brought by the finite state automaton (FSA) technology that can deal with the huge lists of names and is very fast in searching the input texts. We employed an efficient algorithm for constructing the minimal FSA described in Daciuk et al. (1998). A freely available package provided by the first author of the paper allows building a minimal FSA from a list of pre-defined keywords. The resulting representation for all the relevant data from GeoNames (originally over 1.1 GB) takes only 71MB and the processing is extremely fast.
Research groups:
Licence terms:

Authorized software license


Brno University of Technology, faculty of Information Technology IČ 00216305, Božetěchova 2, 612 66 Brno, (further only FIT BUT) is entitled to license the authorized software accessible at the "Authorized software" page http://www.fit.vutbr.cz/research/prod/index.php?id=177 (further only authorized software). Everyone who uses the software in any way at least once becomes the user. The user agrees to comply to the following conditions of use.

Before first usage of the software the user expressed his/her agreement with the following license conditions:

Authorized software

  • is only possible to use in compliance with these license conditions; the user must ensure that the conditions are fulfilled by the eventual third party who is able to access the authorized software,
  • it is not allowed to sell, rent, or otherwise transfer the license without the permission of FIT BUT,
  • is not allowed to include into other software product and distribute the such products derived from the original authorized software without the permission of FIT BUT, or modify the internal structure in any other way, 
  • cannot be modified as the whole or any of its parts so that the information about FIT BUT is removed,
  • is not allowed to reverse analyze, decompile, or modify in any other way,

As the license is granted for free, the software is not covered with any guarantee (this is valid to the maximum extent possible under the law). The user accepts the software "as it is" without any guarantee of any kind, namely, but not limited to, the guarantee of suitability for sales, suitability for the purpose, occurrence of flaws, functionality, quality, performance, or continuous availability or compatibility with other software. Brno University of Technology (to the maximum possible extent under the law) disclaims the duty to compensate any expenses connected with exploitation of the software now and in the future.

If any of the above conditions is violated, the license is automatically terminated and the user must stop use the authorized software immediately.

Your IPv4 address:
Switch to IPv6 connection

DNSSEC [dnssec]