Automatic speaker recognition system supported by behavioral features of speech signal
DOI:
https://doi.org/10.24425/mms.2026.155813Abstract
The extraction and interpretation of personal data from speech signals, processed through various technical solutions, are key functions of Automatic Speaker Recognition (ASR) systems. Speech conveys information such as language, dialect, and emotions, making ASR systems increasingly essential due to the growing demand for human-computer interaction and biometric security applications in both the military and civilian sectors. Voice, as a unique human characteristic, enables identification without additional attributes that can be lost or destroyed. However, while humans recognize voices naturally, machines face significant computational challenges. Despite advancements in automatic speaker recognition, many challenges remain. This article addresses the use of behavioral voice features in automatic speaker recognition (ASR) systems. The authors aimed to develop and implement a set of behavioral features in an existing ASR system that would increase the number of correct identifications of speaker identity, particularly in the presence of various types of noise. By utilizing the publicly available LibriSpeech voice database, it was possible to compare the developed solution with other ASR systems. In addition, the authors developed a solution that can reduce the impact of external noise on speaker identity recognition accuracy. The key element proved to be the innovative data integration method, which leverages the advantages of various sources of distinctive feature sets. In the experiments conducted, the proposed ASR system demonstrated outstanding performance in automatic speaker recognition. Using the LibriSpeech database, the identification rate exceeded 99% for the train-clean-100 subset and close to 99% for the train-clean-360 subset. Compared with traditional Gaussian Mixture Model (GMM) approaches, which typically achieve about 83% accuracy, the developed solution provides a substantial improvement, reaching identification rates above 99% and demonstrating performance comparable to, or even exceeding, that of modern deep learning based techniques (approximately 98 to 99%).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Metrology and Measurement Systems

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.