N. Brummer: Optimization of Accuracy and Calibration of Binary and Multiclass Pattern Recognizers for Wide Ranges of Applications

Poslucharna A112, FIT VUT Bozetechova, 10:00-12:00 25.3.2009

It is common practice in many fields of basic pattern recognition research to evaluate performance as the misclassification error-rate on a given evaluation database. A limitation of this approach is that it implicitly assumes that all types of misclassification have equal cost and that the prior class distribution equals the relative proportions of classes in the evaluation database.

In this talk, we generalize the traditional error-rate evaluation, to create an evaluation criterion that allows optimization of pattern recognizers for wide ranges of applications, having different class priors and misclassification costs. We further show that this same strategy optimizes the amount of relevant information that recognizers deliver to the user.

In particular, we consider a class of evaluation objectives known as "proper scoring rules", which effectively optimize the ability of pattern recognizers to make minimum-expected-cost Bayes decisions. In this framework, we design our pattern recognizers to:

- extract from the input as much relevant information as possible about the unknown classes, and
- to output this information in the form of well-calibrated class likelihoods.

We refer to this form of output as "application-independent". Then when application-specific priors and costs are added, the likelihoods can be used in a straight-forward and standard way to make minimum-expected-cost Bayes decisions.

A given proper scoring rule can be interpreted as a weighted combination of misclassification costs, with a weight distribution over different costs and/or priors. On the other hand, proper scoring rules can also be interpreted as generalized measures of uncertainty and therefore as generalized measures of information. We show that there is a particular weighting distribution which forms the logarithmic proper scoring rule, and for which the associated
uncertainty measure is Shannon's entropy, which is the canonical information measure. We conclude that optimizing the logarithmic scoring rule not only minimizes error-rates and misclassification costs, but it also maximizes the effective amount of relevant information delivered to the user by the recognizer.

We discuss separately our strategies for binary and multiclass pattern recognition:

- We illustrate the binary case with the example of speaker recognition, where the calibration of detection scores in
likelihood-ratio form is of particular importance for forensic applications.

- We illustrate the multiclass case with examples from the recent 2007 NIST Language Recognition Evaluation, where we experiment with the language recognizers of 7 different research teams, all of which had been designed with one particular language detection application in mind. We show that by re-calibrating these recognizers by optimization of a multiclass logarithmic scoring rule, they can be successfully applied to a range of thousands of other applications.

slides available from

Niko's pages:


Brümmer Niko, Agnitio

Your IPv4 address:
Switch to IPv6 connection

DNSSEC [dnssec]