Connectionist Gender Adaptation
	    in a Hybrid Neural Network / Hidden Markov Model
		       Speech Recognition System


	  Victor Abrash, Horacio Franco, Michael Cohen  (SRI)*
	  Nelson Morgan, Yochai Konig 			(ICSI)**

	*  SRI International, 
	   333 Ravenswood Avenue, 
	   Menlo Park, CA 94025 (USA)
  	** International Computer Science Institute, 
	   1947 Center Street,
	   Berkeley, CA 94704 (USA)


	Technical Areas:
	(N) Neural Networks and Stochastic Modeling for Speech 
	    Recognition
	(I) Automatic Speech Recognition/Understanding
	(B) Dialects and Speech Styles in Speech Processing.

	Please send all correspondence to the first author, 
	Victor Abrash


We are developing a hybrid neural network / hidden Markov model
(HMM) based speaker-independent continuous speech recognition system,
using a multi-layer perceptron (MLP) to replace the tied gaussian
mixtures previously used to estimate emission probabilities in SRI's
DECIPHER system.  Past research [1,2] has shown the ability of this
approach to significantly improve recognition performance by relaxing
traditional parametric and independence assumptions in the HMM
framework.


Separate modeling of male and female speech in HMM's can improve
recognition performance [3,4], given enough training data.  The usual
approach doubles the number of parameters to be estimated, making the
models less robust to new speakers.  Furthermore, although there are
some fundamental differences between male and female speech, there are
many more acoustic regularities which should be exploited for better
recognition.


In this paper, we extend our system with a gender-adaptive version of
our previous MLP.  We approach the issue of modeling consistency in our
male and female networks by initializing their weight values from our
fully trained gender-independent network, and continuing training only
on the appropriate subset of the training data.  This work is related to
[5], which focused on gender classification and modeling with
independent MLP's.


When training neural networks, the starting point in weight space often
determines the final solutions available to the network.  In this work,
we perform gender adaptation by training our weights only incrementally
from their starting point, keeping much of the information encoded in
the original gender-independent network.  Conceptually, this procedure
provides some degree of non-linear smoothing between our initial
gender-independent and final gender-dependent parameters.  This method
reduces the training time to obtain gender dependent networks compared
to training from random initial weights, although further testing is
required to determine if more accurate or robust recognition is
achieved.


We plan to extend our results to other sources of pronunciation
consistency, such as dialect region, so another goal was to reduce the
number of independent parameters in our gender dependent networks.  To
this end, we explore different variations of our layered feed-forward
architecture, in which only subsets of the weights and biases are
adapted.  We report word recognition performance versus the number and
location of additional parameters.


Experimentally, we obtain a statistically significant 5.9% improvement
in recognition accuracy in a 600 sentence, 1000 word vocabulary
no-grammar test set, which is comparable to improvements seen with
gender dependent HMM's.  This is encouraging, since [3] indicates that
we could do even better using larger speech databases to train our
network.


Refs:
1:  Bourlard, H., N. Morgan, "Merging Multilayer Perceptrons and Hidden
    Markov Models: Some Experiments in Continuous Speech Recognition,"
    Neural Networks: Advances and Applications, Elsevier Science
    Publishers B.V., North Holland, 1991

2:  Renals, S, N. Morgan, M. Cohen, H. Franco, "Connectionist Probability
    Estimation in the DECIPHER Speech Recognition System,"
    Proceedings ICASSP-92

3:  Murveit, Hy, M. Weintraub, M. Cohen, "Training Set Issues in SRI's
    DECIPHER Speech Recognition System," Proceedings of the DARPA Speech and
    Natural Language Workshop, June 1990.

4:  Paul, Douglas, "The Lincoln Continuous Speech Recognition System:
    Recent Developments and Results," Proceedings of the DARPA Speech and
    Natural Language Workshop, February, 1989.

5:  Konig, Y., N. Morgan, C. Chandra, "GDNN: A Gender-Dependent Neural
    Network for Continuous Speech Recognition," International Computer
    Science Institute Technical Report TR-91-071, December 1991.