Comparative Evaluation of Unigram and Bigram Models for Sentiment Classification of Hotel Reviews

Bharti B. Balande; Ramesh R. Manza; Suryakant S. Revate

doi:10.32628/IJSRST2613378

Authors

Bharti B. Balande Department of Computer Science and IT, Dr. Babasaheb Ambedkar Marathwada University, Chattrapati Sambhajinagar, Maharashtra, India Author
Ramesh R. Manza Department of Computer Science and IT, Dr. Babasaheb Ambedkar Marathwada University, Chattrapati Sambhajinagar, Maharashtra, India Author
Suryakant S. Revate Department of Computer Science & IT, Shri Chhatrapati Shivaji College, Omerga, Maharashtra, India Author

DOI:

https://doi.org/10.32628/IJSRST2613378

Keywords:

Sentiment Analysis, N-Gram Models, Text Classification, Machine Learning, Hotel Reviews, Natural Language Processing

Abstract

This paper provides a comparative analysis of as much as possible piece of N-gram-based sentiment classification models to huge-scale hotel review data. Three feature extraction methods such as unigram, a bigram, and a combination of both unigram and bigram representations were used to examine sentiment polarity with respect to negative, neutral and positive classes. Machine learning classifiers (Logistic Regression, Support Vector Machine (SVM), Naive Bayes, Ridge, Stochastic Gradient Descent (SGD) and Passive Aggressive) were tested on three baseline models (M1-M3). An experiment shows that the highest performance is achieved by the unigram-bigram model (M3), and the accuracy, precision, and recall indicate that M3-UniBigram-SVM had an evaluation of 0.8538, 0.8298 and 0.8538 respectively. The analysis of confusion matrices revealed that the highest accuracy was obtained when positive sentiment was considered and the most difficult category according to semantic ambiguity was that of neutral sentiment. The findings prove the hypothesis that hybrid N-gram representations can greatly improve the sentiment classification performances when compared to either single unigram or bigram models. The research generates potent baseline models that are used to support highly developed hybrid sentiment classification models.

Downloads

Download data is not yet available.

References

Jurafsky, Daniel, and James H. Martin. 2020. Speech and Language Processing. 3rd ed. Draft.

Liu, Bing. 2012. Sentiment Analysis and Opinion Mining. San Rafael: Morgan & Claypool.

Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.

Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. “Thumbs up? Sentiment Classification Using Machine Learning Techniques.” Proceedings of the ACL Conference, 79–86.

P. K. Pal, B. Kataria and J. Jangid, "Schematic-Aware PCB Inspection Using Computer Vision and Deep Learning for Trace, Solder, and Net-Level Fault Detection," in IEEE Transactions on Components, Packaging and Manufacturing Technology, doi: 10.1109/TCPMT.2026.3687148.

Sebastiani, Fabrizio. 2002. “Machine Learning in Automated Text Categorization.” ACM Computing Surveys 34 (1): 1–47.

Comparative Evaluation of Unigram and Bigram Models for Sentiment Classification of Hotel Reviews

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

RightSideBlock

IssueDate

Latest publications