Automatic speaker recognition system supported by behavioral features of speech signal

Authors

  • Dominik Mały Military University of Technology, Faculty of Electronics, Institute of Radioelectronics, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
  • Andrzej Dobrowolski Military University of Technology, Faculty of Electronics, Institute of Electronic Systems, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
  • Kamil Kamiński Military University of Technology, Institute of Optoelectronics, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland

DOI:

https://doi.org/10.24425/mms.2026.155813

Abstract

The extraction and interpretation of personal data from speech signals, processed through various technical solutions, are key functions of Automatic Speaker Recognition (ASR) systems. Speech conveys information such as language, dialect, and emotions, making ASR systems increasingly essential due to the growing demand for human-computer interaction and biometric security applications in both the military and civilian sectors. Voice, as a unique human characteristic, enables identification without additional attributes that can be lost or destroyed. However, while humans recognize voices naturally, machines face significant computational challenges. Despite advancements in automatic speaker recognition, many challenges remain. This article addresses the use of behavioral voice features in automatic speaker recognition (ASR) systems. The authors aimed to develop and implement a set of behavioral features in an existing ASR system that would increase the number of correct identifications of speaker identity, particularly in the presence of various types of noise. By utilizing the publicly available LibriSpeech voice database, it was possible to compare the developed solution with other ASR systems. In addition, the authors developed a solution that can reduce the impact of external noise on speaker identity recognition accuracy. The key element proved to be the innovative data integration method, which leverages the advantages of various sources of distinctive feature sets. In the experiments conducted, the proposed ASR system demonstrated outstanding performance in automatic speaker recognition. Using the LibriSpeech database, the identification rate exceeded 99% for the train-clean-100 subset and close to 99% for the train-clean-360 subset. Compared with traditional Gaussian Mixture Model (GMM) approaches, which typically achieve about 83% accuracy, the developed solution provides a substantial improvement, reaching identification rates above 99% and demonstrating performance comparable to, or even exceeding, that of modern deep learning based techniques (approximately 98 to 99%).

Downloads

Published

2026-06-02

How to Cite

Mały, Dominik, et al. “Automatic Speaker Recognition System Supported by Behavioral Features of Speech Signal”. Metrology and Measurement Systems, vol. 33, no. 1, June 2026, pp. 1-22, doi:10.24425/mms.2026.155813.

Issue

Section

Articles