/
/
/
Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification

Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification

Original Research ArticleAug 11, 2022Vol. 23 No. 2 (2023) 10.55003/cast.2022.02.23.006

Abstract

Classification model performance can be degraded by label noise in the training set. The sentiment classification domain also struggles with this issue, whereby customer reviews can be mislabeled. Some customers give a rating score for a product or service that is inconsistent with the review content. If business owners are only interested in the overall rating picture that includes mislabeling, this can lead to erroneous business decisions. Therefore, this issue became the main challenge of this study. If we assume that customer reviews with noisy labels in the training data are validated and corrected before the learning process, then the training set can generate a predictive model that returns a better result for the sentiment analysis or classification process. Therefore, we proposed a mechanism, called polarity label analyzer, to improve the quality of a training set with noisy labels before the learning process. The proposed polarity label analyzer was used to assign the polarity class of each sentence in a customer review, and then polarity class of that customer review was concluded by voting. In our experiment, datasets were downloaded from TripAdvisor and two linguistic experts helped to assign the correct labels of customer reviews as the ground truth. Sentiment classifiers were developed using the k-NN, Logistic Regression, XGBoost, Linear SVM and CNN algorithms. After comparing the results of the sentiment classifiers without training set improvement and the results with training set improvement, our proposed method improved the average scores of F1 and accuracy by 20.59%.

Keywords: label noise; sentiment classification; polarity label analyzer; k-NN; logistic regression; XGBoost; linear SVM; CNN

*Corresponding author: Tel.: (+66) 43654359 ext. 5365, 5003

                                             E-mail: jantima.p@msu.ac.th

References

1
Kaushik, R., 2012. Impact of social media on marketing. International Journal of Computational Engineering and Management, 15(2), 91-95.
2
Appel, G., Grewal, L., Hadi, R. and Stephen, A.T., 2020. The future of social media in marketing. Journal of the Academy of Marketing Science, 48, 79-95.
3
Chong, A.Y.L., Lacka, E., Li, B. and Chan, H.K., 2018. The role of social media in enhancing guanxi and perceived effectiveness of E-commerce institutional mechanisms in online marketplace. Journal of Information and Management, 55(5), 621-632.
4
He, W., Wang, F.-K. and Akula, V., 2017. Managing extracted knowledge from big social media data for business decision making. Journal of Knowledge Management, 21(2), 275-294.
5
Karakaya, F. and Barnes, N.G., 2017. Impact of online reviews of customer care experience on brand or company selection. Journal of Consumer Marketing, 27(5), 447-457.

Author Information

Thananchai Khamket

Intellect Laboratory, Faculty of Informatics, Mahasarakham University, Mahasarakham, Thailand

Jantima Polpinij*

Intellect Laboratory, Faculty of Informatics, Mahasarakham University, Mahasarakham, Thailand

About this Article

Current Journal

Vol. 23 No. 2 (2023)

Type of Manuscript

Original Research Article

Keywords

label noise;
sentiment classification;
polarity label analyzer;
k-NN;
logistic regression;
XGBoost;
linear SVM;
CNN

Published

11 August 2022

DOI

10.55003/cast.2022.02.23.006

Current Journal

Journal Cover
Vol. 23 No. 2 (2023)

Search

Latest Articles

Original Research Article
Mar 12, 2025

Comparison of Early and Late Season Phytochemical Content in Mon Thong Durian Cultivar (Durio zibethinus Murray)

Original Research Article
Mar 12, 2025

Diversity of Macrofungi in the Nature Trail of Namtok Phlio National Park, Chanthaburi Province, Thailand

Original Research Article
Mar 12, 2025

Selection of Stable Rice Genotypes through WAASB and MTSI Indices

Original Research Article
Mar 12, 2025

Sensitivity of Phytophthora palmivora Causing Durian Diseases to Metalaxyl-M and Dimethomorph in Southern and Eastern Thailand