Connectionist Gender Adaptation in a Hybrid Neural Network / Hidden Markov Model Speech Recognition System Victor Abrash, Horacio Franco, Michael Cohen (SRI)* Nelson Morgan, Yochai Konig (ICSI)** * SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025 (USA) ** International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704 (USA) Technical Areas: (N) Neural Networks and Stochastic Modeling for Speech Recognition (I) Automatic Speech Recognition/Understanding (B) Dialects and Speech Styles in Speech Processing. Please send all correspondence to the first author, Victor Abrash We are developing a hybrid neural network / hidden Markov model (HMM) based speaker-independent continuous speech recognition system, using a multi-layer perceptron (MLP) to replace the tied gaussian mixtures previously used to estimate emission probabilities in SRI's DECIPHER system. Past research [1,2] has shown the ability of this approach to significantly improve recognition performance by relaxing traditional parametric and independence assumptions in the HMM framework. Separate modeling of male and female speech in HMM's can improve recognition performance [3,4], given enough training data. The usual approach doubles the number of parameters to be estimated, making the models less robust to new speakers. Furthermore, although there are some fundamental differences between male and female speech, there are many more acoustic regularities which should be exploited for better recognition. In this paper, we extend our system with a gender-adaptive version of our previous MLP. We approach the issue of modeling consistency in our male and female networks by initializing their weight values from our fully trained gender-independent network, and continuing training only on the appropriate subset of the training data. This work is related to [5], which focused on gender classification and modeling with independent MLP's. When training neural networks, the starting point in weight space often determines the final solutions available to the network. In this work, we perform gender adaptation by training our weights only incrementally from their starting point, keeping much of the information encoded in the original gender-independent network. Conceptually, this procedure provides some degree of non-linear smoothing between our initial gender-independent and final gender-dependent parameters. This method reduces the training time to obtain gender dependent networks compared to training from random initial weights, although further testing is required to determine if more accurate or robust recognition is achieved. We plan to extend our results to other sources of pronunciation consistency, such as dialect region, so another goal was to reduce the number of independent parameters in our gender dependent networks. To this end, we explore different variations of our layered feed-forward architecture, in which only subsets of the weights and biases are adapted. We report word recognition performance versus the number and location of additional parameters. Experimentally, we obtain a statistically significant 5.9% improvement in recognition accuracy in a 600 sentence, 1000 word vocabulary no-grammar test set, which is comparable to improvements seen with gender dependent HMM's. This is encouraging, since [3] indicates that we could do even better using larger speech databases to train our network. Refs: 1: Bourlard, H., N. Morgan, "Merging Multilayer Perceptrons and Hidden Markov Models: Some Experiments in Continuous Speech Recognition," Neural Networks: Advances and Applications, Elsevier Science Publishers B.V., North Holland, 1991 2: Renals, S, N. Morgan, M. Cohen, H. Franco, "Connectionist Probability Estimation in the DECIPHER Speech Recognition System," Proceedings ICASSP-92 3: Murveit, Hy, M. Weintraub, M. Cohen, "Training Set Issues in SRI's DECIPHER Speech Recognition System," Proceedings of the DARPA Speech and Natural Language Workshop, June 1990. 4: Paul, Douglas, "The Lincoln Continuous Speech Recognition System: Recent Developments and Results," Proceedings of the DARPA Speech and Natural Language Workshop, February, 1989. 5: Konig, Y., N. Morgan, C. Chandra, "GDNN: A Gender-Dependent Neural Network for Continuous Speech Recognition," International Computer Science Institute Technical Report TR-91-071, December 1991.