PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 32 (4) Jul. 2024 / JST-4625-2023

 

A Deep Learning-based Classification Model for Arabic News Tweets Using Bidirectional Long Short-Term Memory Networks

Chin-Teng Lin, Mohammed Thanoon and Sami Karali

Pertanika Journal of Science & Technology, Volume 32, Issue 4, July 2024

DOI: https://doi.org/10.47836/pjst.32.4.09

Keywords: Arabic, Arabic dataset, Arabic news, ML, NLP, Twitter

Published on: 25 July 2024

This research develops a classification model for Arabic news tweets using Bidirectional Long Short-Term Memory networks (BiLSTM). Tweets about Arabic news were gathered between August 2016 and August 2020 and divided into five categories. Custom Python scripts, Twitter API and the GetOldTweets3 Python library were used to collect the data. BiLSTM was used to train and test the model. The results indicated an average accuracy, precision, recall, and f1-score of 0.88, 0.92, 0.88, and 0.89, respectively. The results could have practical implications for Arabic machine learning and NLP tasks in research and practice.

  • Abdelaal, H. M., Elmahdy, A. N., Halawa, A. A., & Youness, H. A. (2018). Improve the automatic classification accuracy for Arabic tweets using ensemble methods. Journal of Electrical Systems and Information Technology, 5(3), 363-370. https://doi.org/10.1016/j.jesit.2018.03.001

  • Ahmed, W., Bath, P. A., & Demartini, G. (2017). Using Twitter as a data source: An overview of ethical, legal, and methodological challenges. In K. Woodfield (Ed.), The Ethics of Online Research (Advances in Research Ethics and Integrity, (Vol. 2, pp. 79-107). Emerald Publishing Limited. https://doi.org/10.1108/S2398-601820180000002004

  • Al Sbou, A. M., Hussein, A., Talal, B., & Rashid, R. A. (2018). A survey of Arabic text classification models. International Journal of Electrical and Computer Engineering, 8(6), 4352-4355. https://dx.doi.org/ 10.11591/ijece.v8i6.pp4352-4355

  • Alabbas, W., Al-Khateeb, H. M., & Mansour, A. (2016). Arabic text classification methods: Systematic literature review of primary studies. In 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt) (pp. 361-367). IEEE Publishing. https://doi.org/10.1109/CIST.2016.7805072

  • Alayba, A. M., Palade, V., England, M., & Iqbal, R. (2018). A combined CNN and LSTM model for Arabic sentiment analysis. In A. Holzinger, P. Kieseberg, A. Tjoa, E. Weippl (Eds.), Machine Learning and Knowledge ExtractionCD-MAKE 2018, Lecture Notes in Computer Science (Vol 11015, pp. 179-191). Springer. https://doi.org/10.1007/978-3-319-99740-7_12

  • Albalooshi, N., Mohamed, N., & Al-Jaroodi, J. (2011). The challenges of Arabic language use on the Internet. In 2011 International Conference for Internet Technology and Secured Transactions (pp. 378-382). IEEE Publishing.

  • Almuqren, L., & Cristea, A. (2021). AraCust: A Saudi Telecom tweets corpus for sentiment analysis. PeerJ Computer Science, 7, Article e510. https://doi.org/10.7717/peerj-cs.510

  • Alonso, M. A., Vilares, D., Gómez-Rodríguez, C., & Vilares, J. (2021). Sentiment analysis for fake news detection. Electronics, 10(11), Article 1348. https://doi.org/10.3390/electronics10111348

  • Al-Tahrawi, M. M., & Al-Khatib, S. N. (2015). Arabic text classification using Polynomial Networks. Journal of King Saud University-Computer and Information Sciences, 27(4), 437-449. https://doi.org/10.1016/j.jksuci.2015.02.003

  • Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. ArXiv, Article 2003.00104. https://doi.org/10.48550/arXiv.2003.00104

  • Aslam, S. (2018). Twitter by the numbers: Stats, demographics & fun facts. Omnicoreagency. com. https://www.omnicoreagency.com/twitter-statistics/

  • Assiri, A., Emam, A., & Al-Dossari, H. (2018). Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis. Journal of Information Science, 44(2), 184-202. https://doi.org/10.1177/0165551516688143

  • Bdeir, A. M., & Ibrahim, F. (2020). A framework for Arabic tweets multi-label classification using word embedding and neural networks algorithms. In Proceedings of the 2020 2nd International Conference on Big Data Engineering (pp. 105-112). ACM Publishing. https://doi.org/10.1145/3404512.3404526

  • Bekkali, M., & Lachkar, A. (2014). Arabic tweets categorization based on rough set theory. International Journal of Computer Science & Information Technology, 6, 83-96. https://dx.doi.org/ 10.5121/csit.2014.41109

  • Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., & El Moutaouakkil, A. E. (2018). Arabic text classification using deep learning technics. International Journal of Grid and Distributed Computing, 11(9), 103-114. http://dx.doi.org/10.14257/ijgdc.2018.11.9.09

  • Buabin, E. (2012). Boosted hybrid recurrent neural classifier for text document classification on the Reuters news text corpus. International Journal of Machine Learning and Computing, 2(5), Article 588. https://dx.doi.org/ 10.7763/IJMLC.2012.V2.195

  • Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modelling. ArXiv, Article 1412.3555. https://doi.org/10.48550/arXiv.1412.3555

  • Dahou, A., Xiong, S., Zhou, J., Haddoud, M. H., & Duan, P. (2016). Word embeddings and convolutional neural network for Arabic sentiment classification. In Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers (pp. 2418-2427). The COLING 2016 Organizing Committee.

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Arxiv, Article 1810.04805. https://doi.org/10.48550/arXiv.1810.04805.

  • El Mahdaouy, A., Gaussier, E., & El Alaoui, S. O. (2017). Arabic text classification based on word and document embeddings. In A. Hassanien, K. Shaalan, T. Gaber, A. Azar, M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. Advances in Intelligent Systems and Computing (Vol. 533). Springer. https://doi.org/10.1007/978-3-319-48308-5_4

  • El-Alami, F. Z., & El Alaoui, S. O. (2016). An efficient method based on a deep learning approach for Arabic text categorization. International Arab Conference on Information Technology, 1-7.

  • El-Alami, F. Z., El Alaoui, S. O., & En-Nahnahi, N. (2020). Deep neural models and retrofitting for Arabic text categorization. International Journal of Intelligent Information Technologies (IJIIT), 16(2), 74-86. https://dx.doi.org/10.4018/IJIIT.2020040104

  • Elfaik, H., & Nfaoui, E. H. (2021). Combining context-aware embeddings and an attentional deep learning model for Arabic affect analysis on Twitter. IEEE Access, 9, 111214-111230. https://doi.org/10.1109/ACCESS.2021.3102087

  • Elhassan, R., & Ahmed, M. (2015). Arabic text classification review. International Journal of Computer Science and Software Engineering (IJCSSE), 4(1), 1-5.

  • Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning models. Information Processing & Management, 57(1), Article 102121. https://doi.org/10.1016/j.ipm.2019.102121

  • Feng, S., & Kirkley, A. (2021). Integrating online and offline data for crisis management: Online geolocalized emotion, policy response, and local mobility during the COVID crisis. Scientific Reports, 11, Article 8574. https://doi.org/10.1038/s41598-021-88010-3

  • Galal, M., Madbouly, M. M., & El-Zoghby, A. D. E. L. (2019). Classifying Arabic text using deep learning. Journal of Theoretical and Applied Information Technology, 97(23), 3412-3422.

  • Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM networks. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005 (Vol. 4, pp. 2047-2052). IEEE Publishing. https://doi.org/10.1109/IJCNN.2005.1556215

  • Guzman, E., Alkadhi, R., & Seyff, N. (2017). An exploratory study of Twitter messages about software applications. Requirements Engineering, 22, 387-412. https://doi.org/10.1007/s00766-017-0274-x

  • Hmeidi, I., Al-Ayyoub, M., Abdulla, N. A., Almodawar, A. A., Abooraig, R., & Mahyoub, N. A. (2015). Automatic Arabic text categorization: A comprehensive comparative study. Journal of Information Science, 41(1), 114-124. https://doi.org/10.1177/0165551514558172

  • Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. ArXiv, Article 1801.06146. https://doi.org/10.48550/arXiv.1801.06146

  • Hunt, E. (2016). What is fake news? How to spot it and what you can do to stop it. The Guardian. https://www.theguardian.com/media/2016/dec/18/what-is-fake-news-pizzagate

  • Ibrahim, M. F., Alhakeem, M. A., & Fadhil, N. A. (2021). Evaluation of Naïve Bayes classification in Arabic short text classification. Al-Mustansiriyah Journal of Science, 32(4), 42-50.

  • Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions On Computers, 4(8), 966-974.

  • IMF. (2016). Economic diversification in oil-exporting Arab countries. International Monetary Fund. https://www.imf.org/en/Publications/Policy-Papers/Issues/2016/12/31/Economic-Diversification-in-Oil-Exporting-Arab-Countries-PP5038

  • Jefferson, H. (2018). Get old tweets programmatically. Github. https://github.com/Jefferson-Henrique/GetOldTweets-python

  • Jordan, S. E., Hovet, S. E., Fung, I. C. H., Liang, H., Fu, K. W., & Tse, Z. T. H. (2018). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), Article 6. https://doi.org/10.3390/data4010006

  • Karali, S. M. Thanoon, C. T. Lin. (2021). Arabic news tweets. Mendeley Data, V3. http://dx.doi.org/10.17632/9dxgbgx86k.3

  • Khoja, Y., Alhadlaq, O., & Alsaif, S. (2017). Auto Generation of Arabic News Headlines. Stanford University.

  • Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), Article 150. https://doi.org/10.3390/info10040150

  • Lehman-Wilzig, S. N., & Seletzky, M. (2010). Hard news, soft news, ‘general’news: The necessity and utility of an intermediate classification. Journalism, 11(1), 37-56. https://doi.org/10.1177/1464884909350642

  • Matrane, Y., Benabbou, F., & Sael, N. (2021). Sentiment analysis through word embedding using AraBERT: Moroccan dialect use case. In 2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA) (pp. 80-87). IEEE.https://doi.org/10.1109/ICDATA52997.2021.00024

  • Moh’d Mesleh, A. (2011). Feature subset selection metrics for Arabic text classification. Pattern Recognition Letters, 32(14), 1922-1929. https://doi.org/10.1016/j.patrec.2011.07.010

  • Mohammed, P., Eid, Y., Badawy, M., & Hassan, A. (2020). Evaluation of different sarcasm detection models for Arabic news headlines. In A. Hassanien, K. Shaalan, & M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. Advances in Intelligent Systems and Computing, (Vol 1058). Springer. https://doi.org/10.1007/978-3-030-31129-2_38

  • Panagiotou, N., Katakis, I., & Gunopulos, D. (2016). Detecting events in online social networks: Definitions, trends and challenges. In S. Michaelis, N. Piatkowski, & M. Stolpe, M. (Eds.), Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science, (Vol 9580). Springer. https://doi.org/10.1007/978-3-319-41706-6_2

  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv, Article 1802.05365. https://doi.org/10.48550/arXiv.1802.05365

  • Raftery, T. (2017). Twitter Arab Word - Statistics Feb 2017. https://weedoo.tech/ twitter-arab-world-statistics-feb-2017

  • Rey, M. V. (2019). What are the eleven parts and their meaning. Philippine News. https://philnews.ph/2019/07/16/parts-of-newspaper/#google_vignette

  • Saeed, M. (2021). Farasapy: A Python wrapper for the well Farasa toolkit. Github. https://github.com/MagedSaeed/farasapy

  • Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017a). Mining text in news channels: A case study from Facebook. International Journal of Information Technology and Language Studies, 1(1), 1-9.

  • Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017b). Mining social media text: Extracting knowledge from Facebook. International Journal of Computing and Digital Systems, 6(02), 73-81. http://dx.doi.org/10.12785/IJCDS/060203

  • Salloum, S. A., Al-Emran, M., Monem, A. A., & Shaalan, K. (2018). Using text mining techniques for extracting information from research articles. In K. Shaalan, A. Hassanien, A., & F. Tolba (Eds.), Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, (Vol 740). Springer. https://doi.org/10.1007/978-3-319-67056-0_18

  • Sayed, M., Salem, R., & Khedr, A. E. (2017). Accuracy evaluation of Arabic text classification. In 2017 12th International Conference on Computer Engineering and Systems (ICCES) (pp. 365-370). IEEE Publishing. https://doi.org/10.1109/ICCES.2017.8275333

  • Schmidhuber, J., & Hochreiter, S. (1997). Long short-term memory. Neural Comput, 9(8), 1735-1780.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (NIPS 2017) (pp. 1-11). NeurIPS Proceedings.

  • Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. ICML ‘97: Proceedings of the European Fourteenth International Conference on Machine Learning (pp. 412 - 420), Morgan Kaufmann Publishers Inc. https://surdeanu.cs.arizona.edu/mihai/teaching/ista555-fall13/readings/yang97comparative.pdf.