ICASSP 2013: The ERBlet Transform

This page provides resources and complementary results for the research article:

"The ERBlet Transform: An Auditory-Based Time-Frequency Representation with Perfect Reconstruction"

T. Necciari, P. Balazs, N. Holighaus, and P. Søndergaard

presented at the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2013). A PDF version of the article is available here for download.

Abstract: This paper describes a method for obtaining a perceptually motivated and perfectly invertible time-frequency representation of a sound signal. Based on frame theory and the recent non-stationary Gabor transform, a linear representation with resolution evolving across frequency is formulated and implemented as a non-uniform filterbank. To match the human auditory time-frequency resolution, the transform uses Gaussian windows equidistantly spaced on the psychoacoustic "ERB" frequency scale. Additionally, the transform features adaptable resolution and redundancy. Simulations showed that perfect reconstruction can be achieved using fast iterative methods and preconditioning even using one filter per ERB and a very low redundancy (1.08). Comparison with a linear gammatone filterbank showed that the ERBlet approximates well the auditory time-frequency resolution.

Complementary results:

ERBlet windows representation: K = 35 ERBlet filters computed for the frequency range 0-8 kHz using V = 1 filter/ERB.

Comparison between ERBlet and other representations. Simulations were performed on a 5-sec musical excerpt from the band Manowar (song "Heart of Steel", studio version) in mono format, sampled at 44.1~kHz, 16~bits/sample. All analyses considered the frequency band 0-22.05~kHz.

redundancy = 12, relative reconstruction error < 1e-15

redundancy = 11.80, relative reconstruction error < 1e-15

redundancy = 12, relative reconstruction error < 1e-15; Implementation in [1].

redundancy = 128, relative reconstruction error = 1.4 for a delay of 4 ms and no post-processing correction of the filterbank delay. Accounting for the filterbank delay at the output of the re-synthesizer module led to relative reconstruction errors of 4.11 x 1e-1, 1.01 x 1e-1 and 2.86 x 1e-3 for delays of 4, 8 and 16 ms, respectively; Implementation in [2].

Algorithms (pseudo-code):

Matlab/Octave scripts available for download.

Archive content:

Scripts for computing the ERBlet transform and its inverse. Includes the iterative reconstruction using the conjugate gradients method (Algorithm 1 above).
Scripts for generating the figures 1 and 2 presented in the manuscript.

IMPORTANT NOTE: The Matlab/Octave toolboxes Linear Time-Frequency Analysis (LTFAT, version 1.2.0 and above) [3] and Auditory Modeling (AM) must be installed to run the ERBlet codes. These toolboxes are freely available at Sourceforge.

References:

[1] G. A. Velasco, N. Holighaus, M. Dörfler, and T. Grill,"Constructing an invertible constant-Q transform with nonstationary Gabor frames", in Proceedings of the14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23 2011, pp.93–99.
[2] V. Hohmann, "Frequency analysis and synthesis using a gammatone filterbank", Acta Acust. united Ac., vol. 88, no. 3, pp. 433–442, 2002.
[3] P. L. Søndergaard, B. Torrésani, and P. Balazs, "The linear time-frequency analysis toolbox", Int. J. Wavelets. Multi., vol. 10, no. 4, pp. 1250032, July 2012.

Name	Zweck	Speicherdauer	Typ	Anbieter
CookieConsent	Speichert Ihre Einwilligung zur Verwendung von Cookies.	1 Jahr	HTML	Web Consent
fe_typo_user	Ordnet Ihren Browser einer Session auf dem Server zu. Dies beeinflusst nur die Inhalte, die Sie sehen und wird von uns nicht ausgewertet oder weiterverarbeitet.	-	HTTP	Web User

Name	Zweck	Speicherdauer	Typ	Anbieter
_pk_id	Wird verwendet, um ein paar Details über den Benutzer wie die eindeutige Besucher-ID zu speichern.	13 Monate	HTML	Matomo-id
_pk_ref	Wird benutzt, um die Informationen der Herkunftswebsite des Benutzers zu speichern.	6 Monate	HTML	Matomo-ref
_pk_ses	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-ses
_pk_cvar	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-cvar
_pk_hsr	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo

Name	Zweck	Speicherdauer	Typ	Anbieter
YouTube	Es wird eine Verbindung mit YouTube hergestellt, um Videos anzuzeigen.	-	Verbindung	YouTube
SoundCloud	Es wird eine Verbindung mit SoundCloud hergestellt, um Audio-Dateien abzuspielen.	-	Verbindung	SoundCloud
Twitter	Es wird eine Verbindung mit Twitter hergestellt, um Tweets anzuzeigen.	-	missing translation: type.	Twitter

ICASSP 2013: The ERBlet Transform

"The ERBlet Transform: An Auditory-Based Time-Frequency Representation with Perfect Reconstruction"

Complementary results:

Algorithms (pseudo-code):

Matlab/Octave scripts available for download.

References:

Kontakt

Presse

Institut für Schallforschung