PhD. Theses

Year: Supervisor: 
Student:  Title: 
Keywords: 

Finite-state based recognition networks for forward-backward speech decoding

Dissertation:2016
Student:Hannemann Mirko, Dipl.-Ing.
Supervisor:Burget Lukáš, doc. Ing., Ph.D.
Department:Department of Computer Graphics and Multimedia FIT BUT
Status:defended
Date:2016-09-26
Files:
Keywords
Automatic speech recognition, LVCSR decoding, recognition networks, weighted finite state transducers, N-gram language models, weight pushing
Abstract
Many tasks can be formulated in the mathematical framework of weighted finite state transducers (WFST). This is also the case for automatic speech recognition (ASR). Nowadays, ASR makes extensive use of composed probabilistic models -- called decoding graphs or recognition networks. They are constructed from the individual components via WFST operations like composition. Each component is a probabilistic knowledge source that constrains the search for the best path through the composed graph -- called decoding. The usage of a coherent framework guarantees, that the resulting automata will be optimal in a well-defined sense. WFSTs can be optimized with the help of determinization and minimization in a given semi-ring. The application of these algorithms results in the optimal structure for search and the optimal distribution of weights is achieved by applying a weight pushing algorithm. The goal of this thesis is to further develop the recipes and algorithms for the construction of optimal recognition networks. We introduce an alternative weight pushing algorithm, that is suitable for an important class of models -- language model transducers, or more generally cyclic WFSTs and WFSTs with failure (back-off) transitions. We also present a recipe to construct recognition networks, which are suitable for decoding backwards in time, and which, at the same time, are guaranteed to give exactly the same probabilities as the forward recognition network. For that purpose, we develop an algorithm for exact reversal of back-off language models and their corresponding language model transducers. We apply these backward recognition networks in an optimization technique: In a static network decoder, we use it for a two-pass decoding setup (forward search and backward search). This approach is called tracked decoding and allows to incorporate the first pass decoding into the second pass decoding by tracking hypotheses from the first pass lattice. This technique results in significant speed-ups, since it allows to decode with a variable beam width, which is most of the time much smaller than the baseline beam. We also show that it is possible to apply the algorithms in a dynamic network decoder by using the incrementally refining recognition setup. This additionally leads to a partial parallelization of the decoding.
ISO 690 Citation
HANNEMANN, Mirko. Finite-state based recognition networks for forward-backward speech decoding. Brno, 2016. Available from: http://www.fit.vutbr.cz/study/DP/PD.php?id=517. PhD. Thesis. Brno University of Technology, Faculty of Information Technology. 2016-09-26. Supervisor Burget Lukáš.
BibTeX
@PHDTHESIS{
    author = {Mirko Hannemann},
    title = {Finite-state based recognition networks for
	forward-backward speech decoding},
    school = {Brno University of Technology,
		Faculty of Information Technology},
    year = {2016},
    location = {Brno, CZ},
    url = {http://www.fit.vutbr.cz/study/DP/PD.php?id=517}
}

Your IPv4 address: 23.20.48.226
Switch to IPv6 connection

DNSSEC [dnssec]