ISSN E 2409-2770
ISSN P 2521-2419

Speech Sources Separation Based on Models of Interaural Parameters and Spatial Properties of Room



Vol. 7, Issue 01, PP. 22-26, January 2020

DOI

Keywords: MESSL, Spatial covariance, SDR, PESQ, MESSL+SC

Download PDF


This paper present the extended evaluation in different reverberant scenarios for different mixtures of speech having interfere at one of the six angles {150, 300, 450, 600, 750, 900 } of the model which implementing spatial covariance with interauralaparameter for improving the performance of MESSL. Thetbinaural spatialtparameters, such as interauralaphase difference IPD and interauralalevel difference ILD and spatialacovariance areamodeled in the short-timeaFourieratransform. The parametersaof the model areaupdated with the expectation-maximization algorithm. The performance of the model is checked in term of Signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), and the results confirmed that the performance of this proposedamodel is improved in highly reverberant rooms.


Muhammad Israr: Department of Electrical Engineering, University of Engineering and Technology, Peshawar,Pakistan 

Muhammad Salman Khan: Department of Electrical Engineering, University of Engineering and Technology, Peshawar,Pakistan

Khushal Khan: Department of Electrical Engineering, University of Engineering and Technology, Peshawar,Pakistan


Muhammad Israr Muhammad Salman Khan and Khushal Khan Speech Sources Separation Based on Models of Interaural Parameters and Spatial Properties o International Journal of Engineering Works Vol. 7 Issue 01 PP. 22-26 January 2020 https://doi.org/10.34259/ijew.20.7012226


 [1]     M. I. Mandel, R. J. Weiss, and D. P. W. Ellis, “Model-Based Expectation-Maximization Source Separation and Localization,” IEEE Trans. Audio, Speech Lang. Process., vol. 18, no. 2, pp. 382–394, 2010.

[2]          A. Ephrat et al., “Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,” vol. 37, no. 4, 2018.

[3]          N. Roman, D. Wang, and G. J. Brown, “Speech segregation based on sound localization,” J. Acoust. Soc. Am., vol. 114, no. 4, pp. 2236–2252, 2003.

[4]          M. L. Thesis, “Blind Single Channel Sound Source Separation Mark Leddy B . Sc , M . Sc Dublin Institute of Technology Supervisors : Dan Barry , David Dorran , Eugene Coyle,” 2010.

[5]          N. Hassan and D. A. Ramli, “A Comparative study of Blind source separation for Bioacoustics sounds based on FastICA, PCA and NMF,” Procedia Comput. Sci., vol. 126, pp. 363–372, 2018.

[6]          N. Q. K. Duong, E. Vincent, and R. Gribonval, “Under-determined reverberant audio source separation using a full-rank spatial covariance model,” IEEE Trans. Audio, Speech Lang. Process., vol. 18, no. 7, pp. 1830–1840, 2010.

[7]          S. Rickard, “The DUET Blind Source Separation Algorithm,” pp. 217–241, 2007.

[8]          V. S. Narayanaswamy, S. Katoch, J. J. Thiagarajan, H. Song, and A. Spanias, “Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets,” 2019.

[9]          X. F. Gong, Q. H. Lin, F. Y. Cong, and L. De Lathauwer, “Double Coupled Canonical Polyadic Decomposition for Joint Blind Source Separation,” IEEE Trans. Signal Process., vol. 66, no. 13, pp. 3475–3490, 2018.

[10]        S. Rickard and O. Yilmaz, “Blind Separation of Speech Mixtures via Time-Frequency Masking,” IEEE Trans. Signal Process., vol. 52, no. 7, pp. 1830–1847, 2004.

[11]        T. Gustafsson, B. D. Rao, and M. Trivedi, “Source Localization in Reverberant Environments : Part I - Modeling,” vol. 11, no. 6, pp. 1–22, 2003.

[12]        T. Esch and P. Vary, “Efficient musical noise suppression for speech enhancement systems,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2, pp. 4409–4412, 2009.

[13]        Z. Rafii, A. Liutkus, F. R. Stoter, S. I. Mimilakis, D. Fitzgerald, and B. Pardo, “An Overview of Lead and Accompaniment Separation in Music,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 26, no. 8, pp. 1307–1335, 2018.

[14]        M. Jia, J. Sun, C. Bao, and C. Ritz, “Separation of multiple speech sources by recovering sparse and non-sparse components from B-format microphone recordings,” Speech Commun., vol. 96, no. May 2017, pp. 184–196, 2018.

[15]        S. Smita, S. Biswas, and S. S. Solanki, “Audio Signal Separation and Classification: A Review Paper,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 2, no. 11, pp. 6960–6966, 2014.

[16]        N. Q. K. Duong, E. Vincent, and R. Gribonval, “Under-determined convolutive blind source separation using spatial covariance models,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., pp. 9–12, 2010.

[17]        M. S. Khan, S. M. Naqvi, and J. Chambers, “Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance,” 2013 18th Int. Conf. Digit. Signal Process. DSP 2013, 2013.

[18]        B. Schuller, “【Metrics】Performance Measurement in Blind Audio Source Separation,” vol. 14, no. 4, pp. 139–147, 2013.

[19]        J. J. Thiagarajan, K. Natesan Ramamurthy, and A. Spanias, “Mixing matrix estimation using discriminative clustering for blind source separation,” Digit. Signal Process. A Rev. J., vol. 23, no. 1, pp. 9–18, 2013.

[20]        K. Yatabe and D. Kitamura, “Determined Blind Source Separation via Proximal Splitting Algorithm,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, no. 4, pp. 776–780, 2018.

[21]        A. Tsilfidis, E. Georganti, and J. Mourjopoulos, “Binaural extension and performance of single-channel spectral subtraction dereverberation algorithms,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., no. 5, pp. 1737–1740, 2011.