Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes

S. Paudel, P. W.C. Prasad, Abeer Alsadoon, Md Rafiqul Islam, Amr Elchouemi

Research output: Book chapter/Published conference paperConference paper

Abstract

With the rapid growth of web and mobile technology, Social networking services like Twitter are widely used, resulting in large amounts of data being generated daily in social networking sites. Efficient Sentiment analysis of such data is very important for a range of applications and improvement of accuracy in detecting sentiment is the main aim of this research. This report examines the combination of a Chi-Squared feature selection algorithm, k-mean clustering and TF-IDF for attribute weighting based on Naïve Bayes, for classification of text and sentiment in communications generated on Twitter. This approach is compared with other approaches based on Naïve Bayes to give an account of their relative strengths and weaknesses. When running experiments on multi-domain twitter datasets, results indicate that the proposed method shows superior performance across a range of. The main aim of this research is to enhance the performance of the Naïve Bayes classifier using a feature selection technique.

LanguageEnglish
Title of book or conference publicationInternational Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence
EditorsMohammed Atiquzzaman, Zheng Xu, Jemal Abawajy, Kim-Kwang Raymond Choo, Rafiqul Islam
PublisherSpringer-Verlag London Ltd.
Pages281-298
Number of pages18
ISBN (Print)9783319987750
DOIs
Publication statusPublished - 01 Jan 2019
EventInternational Conference on Applications and Techniques in Cyber Intelligence, ATCI 2018 - Shanghai, China
Duration: 11 Jul 201813 Jul 2018

Publication series

NameAdvances in Intelligent Systems and Computing
Volume842
ISSN (Print)2194-5357

Conference

ConferenceInternational Conference on Applications and Techniques in Cyber Intelligence, ATCI 2018
CountryChina
CityShanghai
Period11/07/1813/07/18

Fingerprint

Feature extraction
Classifiers
Communication
Experiments

Cite this

Paudel, S., Prasad, P. W. C., Alsadoon, A., Islam, M. R., & Elchouemi, A. (2019). Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes. In M. Atiquzzaman, Z. Xu, J. Abawajy, K-K. R. Choo, & R. Islam (Eds.), International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence (pp. 281-298). (Advances in Intelligent Systems and Computing; Vol. 842). Springer-Verlag London Ltd.. https://doi.org/10.1007/978-3-319-98776-7_30
Paudel, S. ; Prasad, P. W.C. ; Alsadoon, Abeer ; Islam, Md Rafiqul ; Elchouemi, Amr. / Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes. International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence. editor / Mohammed Atiquzzaman ; Zheng Xu ; Jemal Abawajy ; Kim-Kwang Raymond Choo ; Rafiqul Islam. Springer-Verlag London Ltd., 2019. pp. 281-298 (Advances in Intelligent Systems and Computing).
@inproceedings{664d5378dc9948209ee77434b5c9e491,
title = "Feature selection approach for twitter sentiment analysis and text classification based on chi-square and na{\"i}ve bayes",
abstract = "With the rapid growth of web and mobile technology, Social networking services like Twitter are widely used, resulting in large amounts of data being generated daily in social networking sites. Efficient Sentiment analysis of such data is very important for a range of applications and improvement of accuracy in detecting sentiment is the main aim of this research. This report examines the combination of a Chi-Squared feature selection algorithm, k-mean clustering and TF-IDF for attribute weighting based on Na{\"i}ve Bayes, for classification of text and sentiment in communications generated on Twitter. This approach is compared with other approaches based on Na{\"i}ve Bayes to give an account of their relative strengths and weaknesses. When running experiments on multi-domain twitter datasets, results indicate that the proposed method shows superior performance across a range of. The main aim of this research is to enhance the performance of the Na{\"i}ve Bayes classifier using a feature selection technique.",
keywords = "Chi-squared, Feature selection, Na{\"i}ve Bayes, TF-IDF, Twitter sentiment analysis",
author = "S. Paudel and Prasad, {P. W.C.} and Abeer Alsadoon and Islam, {Md Rafiqul} and Amr Elchouemi",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-319-98776-7_30",
language = "English",
isbn = "9783319987750",
series = "Advances in Intelligent Systems and Computing",
publisher = "Springer-Verlag London Ltd.",
pages = "281--298",
editor = "Mohammed Atiquzzaman and Zheng Xu and Jemal Abawajy and Choo, {Kim-Kwang Raymond} and Rafiqul Islam",
booktitle = "International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence",
address = "Germany",

}

Paudel, S, Prasad, PWC, Alsadoon, A, Islam, MR & Elchouemi, A 2019, Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes. in M Atiquzzaman, Z Xu, J Abawajy, K-KR Choo & R Islam (eds), International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence. Advances in Intelligent Systems and Computing, vol. 842, Springer-Verlag London Ltd., pp. 281-298, International Conference on Applications and Techniques in Cyber Intelligence, ATCI 2018, Shanghai, China, 11/07/18. https://doi.org/10.1007/978-3-319-98776-7_30

Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes. / Paudel, S.; Prasad, P. W.C.; Alsadoon, Abeer; Islam, Md Rafiqul; Elchouemi, Amr.

International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence. ed. / Mohammed Atiquzzaman; Zheng Xu; Jemal Abawajy; Kim-Kwang Raymond Choo; Rafiqul Islam. Springer-Verlag London Ltd., 2019. p. 281-298 (Advances in Intelligent Systems and Computing; Vol. 842).

Research output: Book chapter/Published conference paperConference paper

TY - GEN

T1 - Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes

AU - Paudel, S.

AU - Prasad, P. W.C.

AU - Alsadoon, Abeer

AU - Islam, Md Rafiqul

AU - Elchouemi, Amr

PY - 2019/1/1

Y1 - 2019/1/1

N2 - With the rapid growth of web and mobile technology, Social networking services like Twitter are widely used, resulting in large amounts of data being generated daily in social networking sites. Efficient Sentiment analysis of such data is very important for a range of applications and improvement of accuracy in detecting sentiment is the main aim of this research. This report examines the combination of a Chi-Squared feature selection algorithm, k-mean clustering and TF-IDF for attribute weighting based on Naïve Bayes, for classification of text and sentiment in communications generated on Twitter. This approach is compared with other approaches based on Naïve Bayes to give an account of their relative strengths and weaknesses. When running experiments on multi-domain twitter datasets, results indicate that the proposed method shows superior performance across a range of. The main aim of this research is to enhance the performance of the Naïve Bayes classifier using a feature selection technique.

AB - With the rapid growth of web and mobile technology, Social networking services like Twitter are widely used, resulting in large amounts of data being generated daily in social networking sites. Efficient Sentiment analysis of such data is very important for a range of applications and improvement of accuracy in detecting sentiment is the main aim of this research. This report examines the combination of a Chi-Squared feature selection algorithm, k-mean clustering and TF-IDF for attribute weighting based on Naïve Bayes, for classification of text and sentiment in communications generated on Twitter. This approach is compared with other approaches based on Naïve Bayes to give an account of their relative strengths and weaknesses. When running experiments on multi-domain twitter datasets, results indicate that the proposed method shows superior performance across a range of. The main aim of this research is to enhance the performance of the Naïve Bayes classifier using a feature selection technique.

KW - Chi-squared

KW - Feature selection

KW - Naïve Bayes

KW - TF-IDF

KW - Twitter sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=85056824047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056824047&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-98776-7_30

DO - 10.1007/978-3-319-98776-7_30

M3 - Conference paper

SN - 9783319987750

T3 - Advances in Intelligent Systems and Computing

SP - 281

EP - 298

BT - International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence

A2 - Atiquzzaman, Mohammed

A2 - Xu, Zheng

A2 - Abawajy, Jemal

A2 - Choo, Kim-Kwang Raymond

A2 - Islam, Rafiqul

PB - Springer-Verlag London Ltd.

ER -

Paudel S, Prasad PWC, Alsadoon A, Islam MR, Elchouemi A. Feature selection approach for twitter sentiment analysis and text classification based on chi-square and naïve bayes. In Atiquzzaman M, Xu Z, Abawajy J, Choo K-KR, Islam R, editors, International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 - Applications and Techniques in Cyber Security and Intelligence. Springer-Verlag London Ltd. 2019. p. 281-298. (Advances in Intelligent Systems and Computing). https://doi.org/10.1007/978-3-319-98776-7_30