Computation time optimization on hashtag segmentation for social media data

Malka N. Halgamuge, Huseyin Caliskan, Azeem Mohammad

Research output: Book chapter/Published conference paperConference paperpeer-review

Abstract

Despite sentiment analysis or contextual mining of text that recognizes and extracts subjective information from a source, it is considered necessary to estimate human behavior. A hashtag is a metadata tag used to classify data into a category. However, there has been little discussion on segmenting hashtags so far. We propose an algorithm to segment hashtags by optimizing computation time. We create candidates according to a given corpus, containing 1-gram (unigram) and 2-gram (bigram) data. The proposed algorithm allows a reduction in the computation time of generating segments by limiting the candidates in a given corpus. The fewer candidates there are, the shorter the calculation is, leading to a decreased duration. In this study, we gather food-related unstructured tweets (N = 951,255) from Twitter. Our results demonstrate that the proposed algorithm allows a computation time reduction of 29.7%. However, if the segment could not be found with the proposed algorithm, the original method for hashtag segmentation, which includes identifying all possible candidates, is used as a fallback method. The proposed approach improves the hashtag segmentation technique, minimizing computation time, which could be utilized in real-time tweet analysis. The result of our study shows that the trend of sentiments for both raw data and segmented data is similar, which also verifies the method’s accuracy. This study’s discoveries uncover that, despite the fact that computers are getting faster, computational resources should be utilized effectively. Our work also provides a data collection model for future surveys, which could also shorten the data retrieval process with multi-threading programming concepts.

Original languageEnglish
Title of host publication2021 IEEE Wireless Communications and Networking Conference, WCNC 2021
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Electronic)9781728195056
DOIs
Publication statusPublished - 2021
Event2021 IEEE Wireless Communications and Networking Conference, WCNC 2021 - Nanjing, China
Duration: 29 Mar 202101 Apr 2021

Publication series

NameIEEE Wireless Communications and Networking Conference, WCNC
Volume2021-March
ISSN (Print)1525-3511

Conference

Conference2021 IEEE Wireless Communications and Networking Conference, WCNC 2021
Country/TerritoryChina
CityNanjing
Period29/03/2101/04/21

Fingerprint

Dive into the research topics of 'Computation time optimization on hashtag segmentation for social media data'. Together they form a unique fingerprint.

Cite this