Most mutual data (MMI) has change into one of many two de facto strategies for sequence-level coaching of speech recognition acoustic fashions. This paper goals to isolate, determine and produce ahead the implicit modelling selections induced by the design implementation of ordinary finite state transducer (FST) lattice primarily based MMI coaching framework. The paper significantly investigates the need to take care of a preselected numerator alignment and raises the significance of determinizing FST denominator lattices on the fly. The efficacy of using on the fly FST lattice determinization is mathematically proven to ensure discrimination on the speculation degree and is empirically proven via coaching deep CNN fashions on a 18K hours Mandarin dataset and on a 2.8K hours English dataset. On assistant and dictation duties, the strategy achieves between 2.3-4.6% relative WER discount (WERR) over the usual FST lattice primarily based strategy.
Leave a Reply