Towards Ontology-Based web text Document Classification | ||||
International Conference on Aerospace Sciences and Aviation Technology | ||||
Article 61, Volume 17, AEROSPACE SCIENCES & AVIATION TECHNOLOGY, ASAT - 17 – April 11 - 13, 2017, April 2017, Page 1-8 PDF (770.4 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/asat.2017.22749 | ||||
View on SCiNiTO | ||||
Authors | ||||
Mohamed K. Elhadad; Khaled M. Badran; Gouda I. Salama | ||||
Egyptian Armed Forces, Egypt. | ||||
Abstract | ||||
The data on the web is generally stored in structured, semi-structured and un- structured formats; from the survey the most of the information of an organization is stored in unstructured textual form .so, the task of categorizing this huge number of unstructured web text documents has become one of the most important tasks when dealing with web. Categorization, Classification, of web text documents aims in assigning one or more class labels, Categories, to the un-labeled ones; the assignment process depends mainly on the contents of the document itself with the help of using one or more of machine learning techniques. Different learning algorithms have been applied on the content of text documents for the classification process. In this paper experiments uses a subset of Reuters-21578 dataset to highlight the leakage and limitations of traditional techniques for feature generation and dimensionality reduction, showing the results of classification accuracy, and F-measure when applying different classification algorithms. | ||||
Keywords | ||||
Feature Extraction; natural language processing; web text documents classification; Vector Space Mode; KNN; Principle component analysis; dimensionality reduction; term frequency inverse document frequency | ||||
Statistics Article View: 323 PDF Download: 338 |
||||