Perceptual and Squared Error Aspects in Speech and Audio Coding

University dissertation from Stockholm : Signaler, sensorer och system

Abstract: In the process of quantization, speech and audio signals are changed. This thesis contains four papers concerned with comparing and minimizing different measures to quantify the changes introduced. Before quantization the signal can be transformed to another domain. Transforms related to the discrete Fourier transform allow for e.cient quantization. The complex coe.cients from these transforms are typically represented by their amplitudes and phases. Papers A and B describe a new method to quantize the amplitudes and phases with scalar polar quantizers. The method is denoted as multi-variate block polar quantization (MBPQ) and is optimized to minimize the average weighted squared error, utilizing high-rate derivations. It is shown that MBPQ outperforms other polar quantizers for all types of data considered. While the perceptual importance of the amplitude values is well established, the perceptual importance of the phase values is still discussed. In paper C, two distortion measures quantifying the detectability of phase distortions are found and veri.ed. Utilizing these distortion measures, it is investigated how well the squared error describes the perception of phase distortions. It was found that the average perceptual distortion reduces only moderately with increasing rate for both vector quantizers minimizing a weighted squared error and vector quantizers minimizing a perceptual distortion measure. It is concluded that future research should focus on .nding parameters that describe the features contained in phase and lead to more e.cient quantization. Paper D investigates perceptual distortion measures in the most commonly used coder paradigm for speech coding: linear-prediction-based analysis-by-synthesis. In the paper, an auditory model based distortion measure is compared to a commonly used weighted squared error derived from the linear prediction coe.cients. It is concluded that sophisticated auditory models are rarely used in this type of coders due to the good performance of the commonly used weighted squared error.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.