Content collector and document analysis for the M-Eco project

Authors:Jeřábek Jan, Marek Tomáš, Otrusina Lubomír, Rylko Vojtěch, Smrž Pavel, Sznapka Jakub, Šafář Martin, Uherčík Maroš
Licence:required - no fee
Keywords:name entitiy recognition,, finite state automaton, Twitter, MedISys, M-Eco
The system collects data from various sources, and makes them accessible to other components of the M-Eco project. The collection focuses on three groups of data: multimedia data such as broadcast news from TV and radio, online news data from MedISys, and social media content from blogs, forums and Twitter messages.
The multimedia data is collected and transcribed by SAIL's Media Mining Indexing System (MMI) that subsequently provides the transcriptions to the MedISys via RSS feed. For later retrieval, links to the original content are part of this RSS feed. MedISys provides these RSS feeds along with additional annotations and online news data collected by this system for further processing by the document analysis component. A third source of data collected by the content collector comprises social media content collected from MedWorm, Twitter, about 85 discussion fora and 45 blogs written especially in German.
Collected documents are pre-processed. This process includes filtering of irrelevant data, named entity recognition, parsing, tagging etc. As a result, a set of tagged documents is produced which is stored in the annotated text repository and made available via web services for the indicator detection and signal generation process. 
Research groups:
Licence terms:

Authorized software license


Brno University of Technology, faculty of Information Technology IČ 00216305, Božetěchova 2, 612 66 Brno, (further only FIT BUT) is entitled to license the authorized software accessible at the "Authorized software" page (further only authorized software). Everyone who uses the software in any way at least once becomes the user. The user agrees to comply to the following conditions of use.

Before first usage of the software the user expressed his/her agreement with the following license conditions:

Authorized software

  • is only possible to use in compliance with these license conditions; the user must ensure that the conditions are fulfilled by the eventual third party who is able to access the authorized software,
  • it is not allowed to sell, rent, or otherwise transfer the license without the permission of FIT BUT,
  • is not allowed to include into other software product and distribute the such products derived from the original authorized software without the permission of FIT BUT, or modify the internal structure in any other way, 
  • cannot be modified as the whole or any of its parts so that the information about FIT BUT is removed,
  • is not allowed to reverse analyze, decompile, or modify in any other way,

As the license is granted for free, the software is not covered with any guarantee (this is valid to the maximum extent possible under the law). The user accepts the software "as it is" without any guarantee of any kind, namely, but not limited to, the guarantee of suitability for sales, suitability for the purpose, occurrence of flaws, functionality, quality, performance, or continuous availability or compatibility with other software. Brno University of Technology (to the maximum possible extent under the law) disclaims the duty to compensate any expenses connected with exploitation of the software now and in the future.

If any of the above conditions is violated, the license is automatically terminated and the user must stop use the authorized software immediately.

