In the big data era, data is made in real-time or closer to real-time. Thus, businesses can utilize this evergrowing volume of data for the data-driven or information-driven decision-making process to improve their businesses. Social media, like Twitter, generates an enormous amount of such data. However, social media data are often unstructured and difficult to manage. Hence, this study proposes an effective text data preprocessing technique and develop an algorithm to train the Support Vector Machine (SVM), Deep Learning (DL) and Naïve Bayes (NB) classifiers to process Twitter data. We develop an algorithm that weights the sentiment score in terms of weight of hashtag and cleaned text. In this study, we (i) compare different preprocessing techniques on the data collected from Twitter using various techniques such as (stemming, lemmatization and spelling correction) to obtain the efficient method (ii) develop an algorithm to weight the scores of the hashtag and cleaned text to obtain the sentiment. We retrieved N=1, 314, 000 Twitter data, and we compared the popularity of two products, Google Now and Amazon Alexa. Using our data preprocessing algorithm and sentiment weight score algorithm, we train SVM, DL, NB models. The results show that stemming technique performed best in terms of computational speed. Additionally, the accuracy of the algorithm was tested against manually sorted sentiments and sentiments produced before text data preprocessing. The result demonstrated that the impact produced by the algorithm was close to the manually annotated sentiments. In terms of model performance, the SVM performed better with the accuracy of 90.3%, perhaps, due to the unstructured nature of Twitter data. Previous studies used conventional techniques; hence, no precise methods were utilized on cleaning the text. Therefore, our approach confirms that proper text data preprocessing technique plays a significant role in the prediction accuracy and computational time of the classifier when using the unstructured Twitter data.
|Title of host publication||Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019|
|Editors||Josiane Mothe, Le Hoang Son, Nguyen Tran Quoc Vinh|
|Publisher||IEEE, Institute of Electrical and Electronics Engineers|
|Number of pages||8|
|Publication status||Published - 05 Dec 2019|
|Event||11th IEEE International Conference on Knowledge and Systems Engineering: KSE 2019 - University of Education and Science, The University of Danang, Da Nang, Viet Nam|
Duration: 24 Oct 2019 → 26 Oct 2019
|Name||Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019|
|Conference||11th IEEE International Conference on Knowledge and Systems Engineering|
|Period||24/10/19 → 26/10/19|
|Other||The 11th KSE is an international forum for presentation, discussion, and exchange of the state-of-the-art research, development, and applications in the field of knowledge and systems engineering.|
The main objective of the KSE is to bring together researchers, practitioners and students not only to share research results and practical applications but also to foster collaboration in the field. The KSE this year will be held in Da Nang, a coastal and tourism city with beautiful beaches and a lot of exciting experiences. The conference will be co-organized by The University of Da Nang - University of Science and Education (UD-USE) and VNU University of Engineering and Technology (VNU-UET).