Forensic Speech Analysis is currently being developed using two main methodologies:

  • Automatic methods, applying digital signal processing algorithms and Bayes Statistics.
  • Acoustic Phonetics and Phonology based on acoustic measurements of speech parameters, such as formant frequencies and fundamental frequency of speech segments. 

The Institute investigates both approaches in the framework of the FSAAWG (Forensic Speech and Audio Working Group) of ENFSI (the European Network of Forensic Science Institutes).

Automatic speaker recognition

From the different parameters for the analysis of voice signals, such as MFCC, RASTA-MFCC, LPC, PLP, RASTA-PLP, etc., the Mel-Cepstrum coefficients (MFCC) have proved to be particularly usable for speech and speaker recognition. The modelling of the speakers is based on Gaussian mixture models (GMM). MFCC feature extraction, different methods of cluster analysis, the GMM parameterisation and the expectation maximisation for the parameter estimation were integrated in the software package STx. Further components for the simulation of automatic speaker recognition processes are the building of “Universal Background Models” (UBM) with “Maximum A Posteriori Adaption” (MAP), as well as additional speech corpora for training and testing.

An “UBM-GMM System” is evaluated by means of two error types: “false alarm” and “false rejection”. The error probability for the different thresholds is calculated from the system’s “scores”. The error probabilities “false alarm probability” and “miss probability”, are usually represented in a “Detection Error Trade off Plot” (DET-Plot).

On the basis of DET-Plots, the comparison of different speaker and signal conditions, such as the minimum required duration of the speech signal, transmission line characteristics, manner of speaking, etc. can be performed. DET-Plots are additionally suitable to compare the efficiency of different recognition methods. The aim of the current work involves the systematic study of boundary conditions which influence the error rate. Among those investigated: sound quality, voice signal duration, language dependence and the variability between and within speakers. Currently, considerable differences between the results obtained from laboratory data and field data are observed.