Forensic Speech and Audio Analysis

Forensic Speech Analysis is currently being developed using two main methodologies:

Automatic methods, applying digital signal processing algorithms and Bayes Statistics.
Acoustic Phonetics and Phonology based on acoustic measurements of speech parameters, such as formant frequencies and fundamental frequency of speech segments.

The Institute investigates both approaches in the framework of the FSAAWG (Forensic Speech and Audio Working Group) of ENFSI (the European Network of Forensic Science Institutes).

Automatic speaker recognition

From the different parameters for the analysis of voice signals, such as MFCC, RASTA-MFCC, LPC, PLP, RASTA-PLP, etc., the Mel-Cepstrum coefficients (MFCC) have proved to be particularly usable for speech and speaker recognition. The modelling of the speakers is based on Gaussian mixture models (GMM). MFCC feature extraction, different methods of cluster analysis, the GMM parameterisation and the expectation maximisation for the parameter estimation were integrated in the software package STx. Further components for the simulation of automatic speaker recognition processes are the building of “Universal Background Models” (UBM) with “Maximum A Posteriori Adaption” (MAP), as well as additional speech corpora for training and testing.

An “UBM-GMM System” is evaluated by means of two error types: “false alarm” and “false rejection”. The error probability for the different thresholds is calculated from the system’s “scores”. The error probabilities “false alarm probability” and “miss probability”, are usually represented in a “Detection Error Trade off Plot” (DET-Plot).

On the basis of DET-Plots, the comparison of different speaker and signal conditions, such as the minimum required duration of the speech signal, transmission line characteristics, manner of speaking, etc. can be performed. DET-Plots are additionally suitable to compare the efficiency of different recognition methods. The aim of the current work involves the systematic study of boundary conditions which influence the error rate. Among those investigated: sound quality, voice signal duration, language dependence and the variability between and within speakers. Currently, considerable differences between the results obtained from laboratory data and field data are observed.

Name	Zweck	Speicherdauer	Typ	Anbieter
CookieConsent	Speichert Ihre Einwilligung zur Verwendung von Cookies.	1 Jahr	HTML	Web Consent
fe_typo_user	Ordnet Ihren Browser einer Session auf dem Server zu. Dies beeinflusst nur die Inhalte, die Sie sehen und wird von uns nicht ausgewertet oder weiterverarbeitet.	-	HTTP	Web User

Name	Zweck	Speicherdauer	Typ	Anbieter
_pk_id	Wird verwendet, um ein paar Details über den Benutzer wie die eindeutige Besucher-ID zu speichern.	13 Monate	HTML	Matomo-id
_pk_ref	Wird benutzt, um die Informationen der Herkunftswebsite des Benutzers zu speichern.	6 Monate	HTML	Matomo-ref
_pk_ses	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-ses
_pk_cvar	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-cvar
_pk_hsr	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo

Name	Zweck	Speicherdauer	Typ	Anbieter
YouTube	Es wird eine Verbindung mit YouTube hergestellt, um Videos anzuzeigen.	-	Verbindung	YouTube
SoundCloud	Es wird eine Verbindung mit SoundCloud hergestellt, um Audio-Dateien abzuspielen.	-	Verbindung	SoundCloud
Twitter	Es wird eine Verbindung mit Twitter hergestellt, um Tweets anzuzeigen.	-	missing translation: type.	Twitter

Forensic Speech and Audio Analysis

Automatic speaker recognition

Kontakt

Presse

Institut für Schallforschung