Konferenzsystem

Contribution

Cepstrum-Based Envelope Estimation using Deep Recurrent Neural Networks for Speech Reconstruction

Authors

* Presenting author

Day / Time: 16.08.2021, 10:20-10:40

Room: Schubert 3

Typ: Regulärer Vortrag

Session: Speech acoustics: measurements, processing, intelligibility, and assessment 1

Article ID:

DOI: https://doi.org/

Online-access: Bitte loggen Sie sich ein, damit weitere Inhalte sichtbar werden (bspw. der Zugang zur Onlinesitzung).

Abstract: Classical algorithms for speech enhancement mostly consist of digital filters, which utilize a statistical approach for optimal estimation of the noise power spectra. Because of constraints, e.g. the assumption of stationary noise, classical algorithms can lead to unsatisfactory results, especially at high noise levels. Machine learning is used for the challenging task of speech enhancement and particularly in reconstruction of speech components that are masked by noise. For speech reconstruction the filter-source model of speech production is applied. The model can be used for speech reconstruction in such a way, that the excitation signal (produced by the human glottis), as well as the so-called envelope (filter characteristics of the vocal tract), are estimated separately and subsequently combined. In this work a deep recurrent neural network (RNN) is applied to estimate a clean envelope in cepstral domain. Finally, the proposed method is analysed objectively, using e.g. the log spectral distance.