Comparative Evaluation of Pretrained Transformer Models for Named Entity Recognition in Informal Pharmacological Texts | ||||
Kafrelsheikh Journal of Information Sciences | ||||
Articles in Press, Accepted Manuscript, Available Online from 04 July 2025 | ||||
Document Type: Original Article | ||||
DOI: 10.21608/kjis.2025.391353.1027 | ||||
![]() | ||||
Authors | ||||
Radman Abdollahi1; Sajed Sarabandi ![]() | ||||
1Shahid Beheshti High School of Exceptional Talents, Bushehr, Iran | ||||
2Department of Computer Science Leiden University, Leiden, Netherlands | ||||
Abstract | ||||
Named Entity Recognition models are widely used across various domains to analyze and understand textual data. Their applications are particularly significant in pharmacological and biomedical fields, where extracting relevant entities is crucial. A key factor in developing an effective NER model is the selection of appropriate pretrained models for transfer learning. This study compares the performance of four commonly used domain-specific pretrained models—ClinicalBERT, BioBERT, SciBERT, and BLURB—with a baseline BERT model. We trained and tested these models using a dataset composed of human text opinions about different drugs, written in ordinary language rather than academic or technical text. Our findings show that domain-specific models, particularly those with vocabularies aligned to the base BERT model, such as ClinicalBERT, BioBERT, SciBERT, and BLURB significantly improve performance. However, models like SciBERT and BLURB, which are trained primarily on academic papers, perform poorly when applied to everyday language. These insights highlight the importance of selecting pretrained models that are not only domain-specific but also suited to the linguistic characteristics of the target textual data. | ||||
Keywords | ||||
Named Entity Recognition (NER); Pretrained Transformer Models; Transfer Learning; Pharmacology | ||||
Statistics Article View: 175 |
||||