Automatic speaker recognition system supported by behavioral features of speech signal

Dominik Mały; Andrzej Dobrowolski; Kamil Kamiński

doi:10.24425/mms.2026.155813

Authors

Dominik Mały Military University of Technology, Faculty of Electronics, Institute of Radioelectronics, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
Andrzej Dobrowolski Military University of Technology, Faculty of Electronics, Institute of Electronic Systems, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
Kamil Kamiński Military University of Technology, Institute of Optoelectronics, ul. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland

DOI:

https://doi.org/10.24425/mms.2026.155813

Keywords

automatic speaker recognition, behavioral features, data fusion, distinct feature selection, genetic algorithm

Abstract

The extraction and interpretation of personal data from speech signals, processed through various technical solutions, are key functions of Automatic Speaker Recognition (ASR) systems. Speech conveys information such as language, dialect, and emotions, making ASR systems increasingly essential due to the growing demand for human-computer interaction and biometric security applications in both the military and civilian sectors. Voice, as a unique human characteristic, enables identification without additional attributes that can be lost or destroyed. However, while humans recognize voices naturally, machines face significant computational challenges. Despite advancements in automatic speaker recognition, many challenges remain. This article addresses the use of behavioral voice features in automatic speaker recognition (ASR) systems. The authors aimed to develop and implement a set of behavioral features in an existing ASR system that would increase the number of correct identifications of speaker identity, particularly in the presence of various types of noise. By utilizing the publicly available LibriSpeech voice database, it was possible to compare the developed solution with other ASR systems. In addition, the authors developed a solution that can reduce the impact of external noise on speaker identity recognition accuracy. The key element proved to be the innovative data integration method, which leverages the advantages of various sources of distinctive feature sets. In the experiments conducted, the proposed ASR system demonstrated outstanding performance in automatic speaker recognition. Using the LibriSpeech database, the identification rate exceeded 99% for the train-clean-100 subset and close to 99% for the train-clean-360 subset. Compared with traditional Gaussian Mixture Model (GMM) approaches, which typically achieve about 83% accuracy, the developed solution provides a substantial improvement, reaching identification rates above 99% and demonstrating performance comparable to, or even exceeding, that of modern deep learning based techniques (approximately 98 to 99%).

Automatic speaker recognition system supported by behavioral features of speech signal

Authors

DOI:

Keywords

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Indexing and Metrics

Automatic speaker recognition system supported by behavioral features of speech signal

Authors

DOI:

Keywords

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Indexing and Metrics

policy Privacy Policy