REAL TIME FIRST STORY DETECTION IN TWITTER USING A MODIFIED TF-IDF ALGORITHM | ||||
International Journal of Intelligent Computing and Information Sciences | ||||
Article 2, Volume 17, Issue 3, July 2017, Page 11-31 PDF (1.28 MB) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ijicis.2017.8245 | ||||
View on SCiNiTO | ||||
Authors | ||||
Samar elbedwehy; M Alrahmawy; Taher Hamza | ||||
Computer Science Department Faculty of Computer and Information Sciences, Mansoura University - Egypt | ||||
Abstract | ||||
Twitter is a social micro blogging, it has its own feature that it enables to tweet only a maximum of 140 characters per tweet. Even with this small number of characters per tweet, analyzing the tweets for billions of users faces the challenges of real-time data processing. One of the important aspects of social behavior is that we can detect the significance of the events and the way the people reacted to them. In this paper, we focus on First Story Detection (FSD) that means we can detect bursts of tweets that refer to a particular topic. First story is defined as the first document from a given series of documents to discuss a specific event, which occurred at a particular time and place. TF-IDF denotes to term frequency–inverse document frequency is an algorithm traditionally used in most of Text similarity applications like FSD. In this paper, we embedded a modified version of TF-IDF algorithm to enhance the accuracy of a pre-implemented open source for FSD that uses Storm platform to benefit from its scalability, efficiency and robustness in analyzing the tweets in real time. The empirical results show significant enhancements in the accuracy of the detection without noticeable effect on performance | ||||
Keywords | ||||
Real Time; Similarity Algorithms; social media; Information Retrieval; Big Data | ||||
Statistics Article View: 444 PDF Download: 430 |
||||