Lecture part A
We provide an introduction to the problem of speech recognition using neural models, emphasizing the CTC loss for training and inference when input and output sequences are of different lengths.
Lecture part B
We discuss beam search for use during inference, and how that procedure may be modeled at training time using a Graph Transformer Network. Graph transformers networks are basically weighted finite-state automata with automatic differentiation, that allows us to encode priors into a graph. There are different type of weighted finite-state and different operations including union, Kleene closure, intersection, compose, and forward score. The loss function is usually the difference between to functions. We can easily implement these networks using GTN library.