Title – Speaker Diarization: A Review and Analysis

Author(s) – Aishwarya Balwani, Omkar Chogle, Shubhankar Kulkarni

Country – India

Abstract – Speaker Diarization over the past few years has garnered tremendous attention, and a large amount of research has been carried out on the same by the audio and speech processing communities. The aim of speaker diarization is to answer the question of ‘who spoke when?’ Speaker diarization makes use of speaker recognition which is achieved by employing speaker segmentation and helps determine the change of speaker in the temporal dimension. Further, speaker clustering helps us group together speech segments on the basis of speaker characteristics. This paper is a general review of speaker diarization as a field of study, and then briefly studies some commonly used aspects of speaker diarization such as Bayesian models, Gaussian Mixture models and Hidden Markov models, before reviewing different types of clustering, the recent and increasing use of i-vectors for unsupervised calibration, and then also analyzes the same and compares their viability.

Keywords – Speaker diarization, speaker segmentation, speaker clustering, Bayesian models, Hidden Markov models, Gaussian mixture models, i-vectors

Full Text – Download PDF RJ010302