The modeling step in speaker detection has an enormous influence on the classification task, because the quality of the model depends on the parameters chosen in this step. False classifications, false identifications, and false verifications can result from malformed speaker models. The initial model parameters have an influence on the final determined parameters of the speaker models. To obtain optimized speaker models, different initialization methods are explored.
Speaker models are represented as Gaussian Mixture Models (GMMs). These models are mixtures of multivariate distributions that are parameterized by the means and the co-variance matrices of the distributions and the mixture weights. The parameters are estimated by the expectation maximization algorithm (EM algorithm) which maximizes the likelihood in the model. Initial model parameters have to be selected for this algorithm. Different initial parameters can lead to a convergence of the algorithm in local maximums. The effect of different initialization methods on the identification rate is analyzed.
Optimized speaker models reflect the speech behavior of the speakers in an optimal way. The inter-speaker variability is maximized while the intra-speaker variability is minimized by avoidance of malformed speaker models. The usage of optimal initialization methods improves the robustness and the reliability of automatic speaker identification and verification systems.