Prof. Ing. Adam Herout, Ph.D.

HRADIŠ Michal, BERAN Vítězslav, ŘEZNÍČEK Ivo, HEROUT Adam, BAŘINA David, VLČEK Adam and ZEMČÍK Pavel. Brno University of Technology at TRECVid 2010. In: TRECVID 2010: Participant Notebook Papers and Slides. Gaithersburg, MD: National Institute of Standards and Technology, 2010, p. 11.
Publication language:english
Original title:Brno University of Technology at TRECVid 2010
Title (cs):Brno University of Technology at TRECVid 2010
Proceedings:TRECVID 2010: Participant Notebook Papers and Slides
Conference:2010 TRECVID Workshop
Place:Gaithersburg, MD, US
Publisher:National Institute of Standards and Technology
TRECVID, semantic indexing, Content-based Copy Detection, image classification
This paper describes our approach to semantic indexing and content-based copy detection which was used for TRECVID 2010 evaluation.

Semantic indexing

1.  The runs differ in the types of visual features used. All runs use several bag-of-word representations fed to separate linear SVMs and the SVMs were fused by logistic regression.

  • F_A_Brno_resource_4: Only single best visual features (on the training set) are used - dense image sampling with rgb-SIFT.
  • F_A_Brno_basic_3: This run uses dense sampling and Harris-Laplace detector in combination with SIFT and rgb-sift descriptors.
  • F_A_Brno_color_2: This run extends F_A_Brno_basic_3 by adding dense sampling with rg-SIFT, Opponent-SIFT, Hue-SIFT, HSV-SIFT, C-SIFT and opponent histogram descriptors.
  • F_A_Brno_spacetime_1: This run extends F_A_Brno_color_2 by adding space-time visual features STIP and HESSTIP.

2. Combining multiple types of visual features improves results significantly. F_A_Brno_color_2 achieve more than twice better results than F_A_Brno_resource_4. The space-time visual features did not improve results.

3. Combining multiple types of visual features is important. Linear SVM is inferior to non-linear SVM in the context of semantic indexing.

Content-based Copy Detection

1.    Two runs submitted, but with similar settings; the difference is only in amount of processed test data (40% and 60%)

  • brno.m.*.l3sl2: SURF, bag-of-words (visual codebook: 2k size, 4 nearest neighbors used in soft-assignment), inverted file index, geometry (homography) based image similarity metric

2.    What if any significant differences (in terms of what measures) did you find among the runs?

  • only one setting used - no differences

3.    Based on the results, can you estimate the relative contribution of each component of your system/approach to its effectiveness?

  • slow search in reference dataset due to unsuitable configuration of used visual codebook

4.    Overall, what did you learn about runs/approaches and the research question(s) that motivated them?

  • change the way of describing the video content - frame based (or key-frame based) approach is not sufficient
   author = {Michal Hradi{\v{s}} and V{\'{i}}t{\v{e}}zslav
	Beran and Ivo {\v{R}}ezn{\'{i}}{\v{c}}ek and Adam
	Herout and David Ba{\v{r}}ina and Adam Vl{\v{c}}ek
	and Pavel Zem{\v{c}}{\'{i}}k},
   title = {Brno University of Technology at TRECVid 2010},
   pages = 11,
   booktitle = {TRECVID 2010: Participant Notebook Papers and Slides},
   year = 2010,
   location = {Gaithersburg, MD, US},
   publisher = {National Institute of Standards and Technology},
   language = {english},
   url = {}

Your IPv4 address:
Switch to https