e-ISSN 2231-8526
ISSN 0128-7680
Nur Atiqah Sia Abdullah and Nur Ida Aniza Rusli
Pertanika Journal of Science & Technology, Volume 29, Issue 1, January 2021
DOI: https://doi.org/10.47836/pjst.29.1.25
Keywords: Machine learning, machine translation, multilingual sentiment analysis, opinion mining, pre-processing, sentiment analysis
Published on: 22 January 2021
With the explosive growth of social media, the online community can freely express their opinions without disclosing their identities. People with hidden agendas can easily post fake opinions to discredit target products, services, politicians, or organizations. With these big data, monitoring opinions and distilling their sentiments remain a formidable task because of the proliferation of diverse sites with a large volume of opinions that are portrayed in multilingual. Therefore, this paper aims to provide a systematic literature review on multilingual sentiment analysis, which summarises the common languages supported in multilingual sentiment analysis, pre-processing techniques, existing sentiment analysis approaches, and evaluation models that have been used for multilingual sentiment analysis. By following the systematic literature review, the findings revealed, most of the models supported two languages, and English is seen as the most used language in sentiment analysis studies. None of the reviewed literature has catered the combination of languages for English, Chinese, Malay, and Hindi language on multilingual sentiment analysis. The common pre-processing techniques for the multilingual domain are tokenization, normalization, capitalization, N-gram, and machine translation. Meanwhile, the sentiment analysis classification techniques for multilingual sentiment are hybrid sentiment analysis, which includes localized language analysis, unsupervised topic clustering, and then followed by multilingual sentiment analysis. In terms of evaluation, most of the studies used precision, recall, and accuracy as the benchmark for the results.
Abdel-Hady, M., Mansour, R., & Ashour, A. (2014, August 24). Cross-lingual twitter polarity detection via projection across word-aligned corpora. In Proceedings of WISDOM (pp. 1-12). New York, USA.
Al-Azani, S., & El-Alfy, E. S. M. (2017). Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Procedia Computer Science, 109, 359-366.
Al-Kabi, M. N., Hailat, T. M., Al-Shawakfa, E. M., & Alsmadi, I. M. (2013). Evaluating English to Arabic machine translation using BLEU. International Journal of Advanced Computer Science and Applications (IJACSA), 4(1), 66-73.
Alsaeedi, A. (2019). EFTSA: Evaluation framework for Twitter sentiment analysis. Journal of Software, 14(1), 24-35. doi: 10.17706/jsw.14.1.24-35
Alsaleem, S. (2011). Automated Arabic text categorization using SVM and NB. International Arab Journal of e-Technology, 2(2), 124-128.
Al-Shabi, A., Adel, A., Omar, N., & Al-Moslmi, T. (2017). Cross-lingual sentiment classification from English to Arabic using machine translation. International Journal of Advanced Computer Science and Applications, 8(12), 434-440. doi: 10.14569/IJACSA.2017.081257
Araujo, M., Reis, J., Pereira, A., & Benevenuto, F. (2016). An evaluation of machine translation for multilingual sentence-level sentiment analysis. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 1140-1145). New York, USA: ACM. doi: https://doi.org/10.1145/2851613.2851817
Ardabili, S., Mosavi, A., & Várkonyi-Kóczy, A. R. (2019). Advances in machine learning modeling reviewing hybrid and ensemble methods. In International Conference on Global Research and Education (pp. 215-227). Cham, Switzerland: Springer. doi: https://doi.org/10.1007/978-3-030-36841-8_21
Argueta, C., Calderon, F. H., & Chen, Y. S. (2016). Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis, 20(6), 1477-1502. doi: 10.3233/IDA-140267
Bahrainian, S. A., & Dengel, A. (2013, December 3-5). Sentiment analysis and summarization of twitter data. In 2013 IEEE 16th International Conference on Computational Science and Engineering (pp. 227-234). Sydney, Australia. doi: 10.1109/CSE.2013.44
Balahur, A., & Perea-Ortega, J. M. (2015). Sentiment analysis system adaptation for multilingual processing: The case of tweets. Information Processing and Management, 51(4), 547-556. doi: https://doi.org/10.1016/j.ipm.2014.10.004
Balahur, A., & Turchi, M. (2012a, September 28). Comparative experiments for multilingual sentiment analysis using machine translation. In Proceedings of the 1st International Workshop in Sentiment Discovery from Affective Data (pp. 75-86). Bristol, UK.
Balahur, A., & Turchi, M. (2012b, July 12). Multilingual sentiment analysis using machine translation? In Proceedings of the 3rd workshop in Computational Approaches to Subjectivity and Sentiment Analysis (pp. 52-60). Jeju, Republic of Korea.
Balahur, A., & Turchi, M. (2013, September 7-13). Improving sentiment analysis in twitter using multilingual machine translated data. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2013 (pp. 49-55). Hissar, Bulgaria.
Balahur, A., & Turchi, M. (2014). Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Computer Speech and Language, 28(1), 56-75. doi: https://doi.org/10.1016/j.csl.2013.03.004
Balahur, A., Turchi, M., Steinberger, R., Ortega, J. M. P., Jacquet, G., Küçük, D., & El Ghali, A. (2014). Resource creation and evaluation for multilingual sentiment analysis in social media texts. In Proceedings of the Ninth International Conference on Language Resources and Evaluation 2014 (pp. 4265-4269). Reykjavik, Iceland: European Language Resources Association.
Baro, R. A., Pagudpud, M. V., Padirayon, L. M., & Dilan, R. E. (2019, February). Classification of project management tool reviews using machine learning-based sentiment analysis. In IOP Conference Series: Materials Science and Engineering (Vol. 482, No. 1, p. 012041). Bristol, UK: IOP Publishing. doi: https://doi.org/10.1088/1757-899X/482/1/012041
Becker, K., Moreira, V. P., & dos Santos, A. G. (2017a). Multilingual emotion classification using supervised learning: Comparative experiments. Information Processing and Management, 53(3), 684-704. doi: https://doi.org/10.1016/j.ipm.2016.12.008
Becker, W., Wehrmann, J., Cagnini, H. E., & Barros, R. C. (2017b). An efficient deep neural architecture for multilingual sentiment analysis in twitter. In Proceedings of the Thirtieth International Flairs Conference (pp. 246-251). Palo Alto, California: AAAI Press.
Bhargava, R., & Sharma, Y. (2017, January 12-13). MSATS: Multilingual sentiment analysis via text summarization. In Proceedings of 7th International Conference on Cloud Computing, Data Science and Engineering-Confluence 2017 (pp. 71-76). Noida, India. doi: 10.1109/CONFLUENCE.2017.7943126
Cruz, F. L., Troyano, J. A., Pontes, B., & Ortega, F. J. (2014). Building layered, multilingual sentiment lexicons at synset and lemma levels. Expert Systems with Applications, 41(13), 5984-5994. doi: https://doi.org/10.1016/j.eswa.2014.04.005
Cui, A., Zhang, M., Liu, Y., & Ma, S. (2011). Emotion tokens: Bridging the gap among multilingual twitter sentiment analysis. In Asia information retrieval symposium (pp. 238-249). Heidelberg, Germany: Springer. doi: https://doi.org/10.1007/978-3-642-25631-8_22
Dadoun, M., & Olssson, D. (2016). Sentiment classification techniques applied to swedish tweets investigating the effects of translation on sentiments from Swedish into English (Degree Project). KTH Royal Institute of Technology, Stockholm, Sweden.
Dashtipour, K., Poria, S., Hussain, A., Cambria, E., Hawalah, A. Y., Gelbukh, A., & Zhou, Q. (2016). Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cognitive computation, 8(4), 757-771. doi: https://doi.org/10.1007/s12559-016-9415-7
Demirtas, E., & Pechenizkiy, M. (2013). Cross-lingual polarity detection with machine translation. In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (pp. 9-17). Chicago, USA: ACM. doi: https://doi.org/10.1145/2502069.2502078
Deriu, J., Lucchi, A., De Luca, V., Severyn, A., Müller, S., Cieliebak, M., … & Jaggi, M. (2017). Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In Proceedings of the 26th International Conference on World Wide Web (pp. 1045-1052). Geneva, Switzerland: International World Wide Web Conferences Steering Committee. doi: https://doi.org/10.1145/3038912.3052611
Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sentiment analysis: A comparative study on different approaches. Procedia Computer Science, 87, 44-49. doi: https://doi.org/10.1016/j.procs.2016.05.124
Dinsoreanu, M., & Bacu, A. (2014, October 21-24). Unsupervised twitter sentiment classification. In Proceedings of the International Conference on Knowledge Management and Information Sharing 2014 (pp. 220-227). Rome, Italy. doi: 10.5220/0005079002200227
Erdmann, M., Ikeda, K., Ishizaki, H., Hattori, G., & Takishima, Y. (2014). Feature based sentiment analysis of tweets in multiple languages. In International Conference on Web Information Systems Engineering (pp. 109-124). Cham, Switzerland: Springer. doi: https://doi.org/10.1007/978-3-319-11746-1_8
Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of Language Resources and Evaluation (LREC 2006), 6, 417-422.
Gînscă, A. L., Boroş, E., Iftene, A., TrandabĂţ, D., Toader, M., Corîci, M., & Cristea, D. (2011, June 24). Sentimatrix: Multilingual sentiment analysis service. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (pp. 189-195). Portland, Oregon, USA.
Hadi, W. E. M., Salam, M. A., & Al-Widian, J. A. (2010). Performance of NB and SVM classifiers in Islamic Arabic data. In Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications (pp. 1-6). New York, USA: ACM. doi: https://doi.org/10.1145/1874590.1874604
Hamouda, A., & Rohaim, M. (2011). Reviews classification using sentiwordnet lexicon. The Online Journal on Computer Science and Information Technology (OJCSIT), 2(1), 120-123.
Injadat, M., Salo, F., & Nassif, A. B. (2016). Data mining techniques in social media: A survey. Neurocomputing, 214, 654-670. doi: https://doi.org/10.1016/j.neucom.2016.06.045
Jing, T. W., & Murugesan, R. K. (2018). A theoretical framework to build trust and prevent fake news in social media using blockchain. In International Conference of Reliable Information and Communication Technology (pp. 955-962). Cham, Switzerland: Springer. doi: https://doi.org/10.1007/978-3-319-99007-1_88
Kaity, M., & Balakrishnan, V. (2017, July 18). A multi-layered framework for building multilingual sentiment lexicons. In Proceedings of the Postgraduate Research Excellence Symposium 2017 (pp. 29-34). Kuala Lumpur, Malaysia.
Kang, D., & Park, Y. (2014). Based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach. Expert Systems with Applications, 41(4), 1041-1050. doi: https://doi.org/10.1016/j.eswa.2013.07.101
Karima, A., & Smaili, K. (2016, September 29 - October 1). Measuring the comparability of multilingual corpora extracted from Twitter and others. In Proceedings of the Tenth International Conference on Natural Language Processing (HrTAL2016). Dubrovnik, Croatia.
Kaur, H., Mangat, V., & Krail, N. (2017). Dictionary based sentiment analysis of hinglish text. International Journal of Advanced Research in Computer Science, 8(5), 816-822. doi: 10.26483/ijarcs.v8i5.3438
Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering (EBSE Technical Report). Keele University, UK.
Lin, Z., Jin, X., Xu, X., Wang, W., Cheng, X., & Wang, Y. (2014a). A cross-lingual joint aspect/sentiment model for sentiment analysis. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1089-1098). New York, USA: ACM. doi: https://doi.org/10.1145/2661829.2662019
Lin, Z., Jin, X., Xu, X., Wang, Y., Tan, S., & Cheng, X. (2014b, August 11-14). Make it possible: Multilingual sentiment analysis without much prior knowledge. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 02 (pp. 79-86). Warsaw, Poland. doi: 10.1109/WI-IAT.2014.83
Lo, S. L., Cambria, E., Chiong, R., & Cornforth, D. (2016). A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowledge-Based Systems, 105, 236-247. doi: https://doi.org/10.1016/j.knosys.2016.04.024
Lo, S. L., Cambria, E., Chiong, R., & Cornforth, D. (2017a). Multilingual sentiment analysis: From formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), 499-527. doi: https://doi.org/10.1007/s10462-016-9508-4
Lo, S. L., Chiong, R., & Cornforth, D. (2017b). An unsupervised multilingual approach for online social media topic identification. Expert Systems with Applications, 81, 282-298. doi: https://doi.org/10.1016/j.eswa.2017.03.029
Lu, Y., & Mori, T. (2017). Deep learning paradigm with transformed monolingual word embeddings for multilingual sentiment analysis. Computing Research Repository, 2017, 1-10.
Maita, A. R. C., Martins, L. C., Lopez Paz, C. R., Peres, S. M., & Fantinato, M. (2015). Process mining through artificial neural networks and support vector machines: A systematic literature review. Business Process Management Journal, 21(6), 1391-1415. doi: https://doi.org/10.1108/BPMJ-02-2015-0017
Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis - A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16-32. doi: https://doi.org/10.1016/j.cosrev.2017.10.002
Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13(1994), 1-298.
Nowson, S., Perez, J., Brun, C., Mirkin, S., & Roux, C. (2015, September 8-11). XRCE personal language analytics engine for multilingual author profiling. In Proceedings of the Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum (pp. 1412-1424). Toulouse, France.
Padmaja, S., & Fatima, S. S. (2013). Opinion mining and sentiment analysis-an assessment of peoples’ belief: A survey. International Journal of Ad hoc, Sensor & Ubiquitous Computing, 4(1), 21-33. doi: 10.5121/ijasuc.2013.4102
Pappas, N., Redi, M., Topkara, M., Jou, B., Liu, H., Chen, T., & Chang, S. F. (2016). Multilingual visual sentiment concept matching. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (pp. 151-158). New York, USA: ACM. doi: https://doi.org/10.1145/2911996.2912016
Paramasivam, M., & Farashaiyan, A. (2016). Language change and maintenance of Tamil language in the multilingual context of Malaysia. International Journal of Humanities and Social Science Invention, 5(12), 55-60.
Patel, S., Nolan, B., Hofmann, M., Owende, P., & Patel, K. (2017). Sentiment analysis: Comparative analysis of multilingual sentiment and opinion classification techniques. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 11(6), 565-571.
Pessutto, L. R. C., Vargas, D. S., & Moreira, V. P. (2018, December 3-6). Clustering multilingual aspect phrases for sentiment analysis. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) (pp. 182-189). Santiago, Chile. doi: 10.1109/WI.2018.00-91
Pustulka-Hunt, E., Hanne, T., Blumer, E., & Frieder, M. (2018, August 27-29). Multilingual sentiment analysis for a swiss gig. In 2018 6th International Symposium on Computational and Business Intelligence (ISCBI) (pp. 94-98). Basel, Switzerland. doi: 10.1109/ISCBI.2018.00028
Rajput, R., & Solanki, A. K. (2016). Review of sentimental analysis methods using lexicon-based approach. International Journal of Computer Science and Mobile Computing, 5(2), 159-166.
Rosenthal, S., Farra, N., & Nakov, P. (2017). SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (pp. 502-518). Vancouver, Canada: Association for Computational Linguistics. doi: 10.18653/v1/S17-2088
Saad, M. K., Langlois, D., & Smaıli, K. (2013). Comparing multilingual comparable articles based on opinions. In Proceedings of the Sixth Workshop on Building and Using Comparable Corpora (pp. 105-111). Sofia, Bulgaria: Association of Computational Linguistics.
Sabbeh, S. F. (2018). Machine-learning techniques for customer retention: A comparative study. International Journal of Advanced Computer Science and Applications, 9(2), 273-281.
Saravia, E., Argueta, C., & Chen, Y. S. (2016). Unsupervised graph-based pattern extraction for multilingual emotion classification. Social Network Analysis and Mining, 6(1), 1-21. doi: https://doi.org/10.1007/s13278-016-0403-4
Shalunts, G., & Backfried, G. (2015). SentiSAIL: Sentiment analysis in English, German, and Russian. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 87-97). Cham, Switzerland: Springer. doi: https://doi.org/10.1007/978-3-319-21024-7_6
Shalunts, G., & Backfried, G. (2016, October 9-13). Multilingual sentiment analysis on data of the Refugee crisis in Europe. In Proceedings of the Fifth International Conference on Data Analytics 2016 (pp. 45-50). Venice, Italy.
Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Australasian Joint Conference on Artificial Intelligence (pp. 1015-1021). Heidelberg, Germany: Springer. doi: https://doi.org/10.1007/11941439_114
Steinberger, J., Lenkova, P., Kabadjov, M., Steinberger, R., & Van der Goot, E. (2011, September 12-14). Multilingual entity-centered sentiment analysis evaluated by parallel corpora. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011 (pp. 770-775). Hissar, Bulgaria.
Sykes, L. M., Evans, W. G., Buchanan, G., Warren, N., & Fernandes, N. (2018). To pen or to probe. Prescribing versus treating, how to decide. South African Dental Journal, 73(1), 53-55.
Tellez, E. S., Miranda-Jiménez, S., Graff, M., Moctezuma, D., Suárez, R. R., & Siordia, O. S. (2017). A simple approach to multilingual polarity classification in Twitter. Pattern Recognition Letters, 94, 68-74. doi: https://doi.org/10.1016/j.patrec.2017.05.024
Thakkar, H., & Patel, D. (2015). Approaches for sentiment analysis on twitter: A state-of-art study. Computing Research Repository, 2015, 1-8.
Tromp, E., & Pechenizkiy, M. (2011, December 11). Senticorr: Multilingual sentiment analysis of personal correspondence. In 2011 IEEE 11th International Conference on Data Mining Workshops (pp. 1247-1250). Vancouver, Canada. doi: 10.1109/ICDMW.2011.152
Tsai, C. F., & Wang, S. P. (2009, March 18-20). Stock price forecasting by hybrid machine learning techniques. In Proceedings of The International Multiconference of Engineers and Computer Scientists (Vol. 1, No. 755, pp. 60-66). Hong Kong, China.
Vīksna, R., & Jēkabsons, G. (2018). Sentiment analysis in Latvian and Russian: A survey. Applied Computer Systems, 23(1), 45-51. doi: https://doi.org/10.2478/acss-2018-0006
Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2015, September 17). Sentiment analysis on monolingual, multilingual and code-switching twitter corpora. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 2-8). Lisboa, Portugal.
Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2017). Supervised sentiment analysis in multilingual environments. Information Processing and Management, 53(3), 595-607. doi: https://doi.org/10.1016/j.ipm.2017.01.004
Vilares, D., Peng, H., Satapathy, R., & Cambria, E. (2018, November 18-21). BabelSenticNet: A commonsense reasoning framework for multilingual sentiment analysis. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1292-1298). Bangalore, India. doi: 10.1109/SSCI.2018.8628718
Volkova, S., Wilson, T., & Yarowsky, D. (2013, October 18-21). Exploring demographic language variations to improve multilingual sentiment analysis in social media. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1815-1827). Washington, USA.
Wang, X., Li, J., Yang, X., Wang, Y., & Sang, Y. (2017, October 21-23). Chinese text sentiment analysis using bilinear character-word convolutional neural networks. In Proceedings of International Conference on Computer Science and Application Engineering (CSAE 2017) (pp. 36-43). Shanghai, China. doi: 10.12783/dtcse/csae2017/17466
Wehrmann, J., Becker, W. E., & Barros, R. C. (2018). A multi-task neural network for multilingual sentiment classification and language detection on Twitter. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (pp. 1805-1812). New York, USA: ACM. doi: https://doi.org/10.1145/3167132.3167325
Yadav, V., & Elchuri, H. (2013, June 14-15). Serendio: Simple and practical lexicon-based approach to sentiment analysis. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (pp. 543-548). Atlanta, Georgia.
Zhou, Y., Demidova, E., & Cristea, A. I. (2016, April). Who likes me more?: Analysing entity-centric language-specific bias in multilingual Wikipedia. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 750-757). New York, USA: ACM. doi: https://doi.org/10.1145/2851613.2851858
ISSN 0128-7680
e-ISSN 2231-8526