Deep Learning and Fourier Transform for Speaker Recognition(DLFSR)

Sayed, Taqwa Mahmoud; Gody, Amr; Muhammad, Sayed T.

doi:10.21608/fuje.2024.313518.1090

	Deep Learning and Fourier Transform for Speaker Recognition(DLFSR)
Fayoum University Journal of Engineering
Volume 8, Issue 1, January 2025, Pages 143-151 PDF (684.3 K)
Document Type: Original Article
DOI: 10.21608/fuje.2024.313518.1090
Authors
Taqwa Mahmoud Sayed^* ¹; Amr Gody²; Sayed T. Muhammad³
¹tamiyyah-fayoum-egypt tamiyyah.fayoum.egypt
²Kyman Faryes Faculty of engineering
³Computers and Systems Engineering Department, Faculty of Engineering, Fayoum University,Fayoum ,Egypt
Abstract
Automatic Speaker recognition (ASR) and verification have gained increased visibility and significance in society as speech technology. Speaker recognition has undergone a revolution due to deep learning techniques, specifically deep neural networks (DNNs). With the use of models like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), it is possible to learn discriminative features directly from unprocessed speech signals without the requirement for manual feature extraction. A growing number of people are using end-to-end speaker recognition models because of how well they work and how easily they can link speaker IDs to speech waveforms. It can recognize and authenticate people based on their distinct vocal traits. A lot of Applications of automatic speaker recognition can be found in many areas, such as voice-based digital device authentication, forensic analysis of audio recordings, access control, and phone-based customer support identification. Through our study, we introduce a Deep Learning and Fourier Transform for Speaker Recognition model (LDLSR)that based on Short Term Fourier Transform (STFT) in which the input speech can be transformed into spectrogram then we apply deep learning especially Convolutional Neural Network (CNN) to the spectrogram images to extract feature and classify the spoken person. The training and validation test are applied on speaker recognition dataset 16000pcm.This model performs excellent result with 98.8% correct identification and classification.
Keywords
ASR; STFT; CNN; RNN; DLFTSR; pcm dataset

Statistics Article View: 198 PDF Download: 152