Comparative Study on End-to-End Speech Recognition Using Pre-trained Models | ||||
Fayoum University Journal of Engineering | ||||
Volume 8, Issue 1, January 2025, Page 131-142 PDF (808.69 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/fuje.2024.312102.1089 | ||||
![]() | ||||
Authors | ||||
Martha F. Ghobrial ![]() ![]() | ||||
1Electronics and Communication Department , Fayoum university,Fayoum ,Egypt | ||||
2Kyman Faryes Faculty of engineering | ||||
3Computers and Systems Engineering Department, Faculty of Engineering, Fayoum University,Fayoum ,Egypt | ||||
Abstract | ||||
In the field of speech and audio signal processing, pre-trained models (PTMs) are commonly available. Pre-trained models (PTMs) offer a collection of initial weights and biases that may be adjusted for a particular task, which makes them a popular starting point for ML model development .State-of-the-art performance in speech recognition, natural language processing, and other applications has been shown using pre-trained model representations. Embeddings obtained from these models are used as inputs for learning algorithms that are used for a variety of downstream tasks. This study compares pretrained models to show how they perform in Automatic Speech Recognition (ASR). The literature research indicates that self-supervised models based on Wav2Vec2.0 and fully supervised models such as Whisper are the basic paradigms and approaches for ASR currently. This study evaluated and compared these strategies in order to check how well they perform across a wide range of test scenarios. This survey aims to serve as a practical manual for understanding, using, and generating PTMs for different NLP tasks. | ||||
Keywords | ||||
PTMs; ASR; Wav2vec2; Whisper; Speech Recognition; Natural language Processing | ||||
Statistics Article View: 92 PDF Download: 81 |
||||