The generation of speaker models is based on acoustic features obtained from speech corpora. From a closed set of speakers, the target speaker has to be identified in an unsupervised identification task.
Training and comparison recordings exist for every speaker. The training set is used to generate parametric speaker models (Gaussian Mixture Models – GMMs), while the test set is needed for the comparisons of all test recordings to all models. The model with the highest similarity (maximum likelihood) is chosen as the target speaker. The efficiency of the identification task is measured as the identification rate (i.e. the number of correctly chosen target speakers).
Aside from biometric commercial applications, the forensic domain is another important field where speaker identification is used. Because speaker identification is a closed-set classification task, it is useful in cases where a target speaker has to be selected from a set of known speakers (e.g. in the case of hidden observations).