Linguistic feature fusion for Arabic fake news detection and named entity recognition using reinforcement learning and swarm optimization
DOI:
https://doi.org/10.1016/j.neucom.2024.128078Keywords:
Arabic fake news detection , Deep learning , Linguistic feature fusion , Named entity recognition , Reinforcement learning , Swarm optimizationAbstract
In the context of the escalating use of social media in Arabic-speaking countries, driven by improved internet access, affordable smartphones, and a growing digital connectivity trend, this study addresses a significant challenge: the widespread dissemination of fake news. The ease and rapidity of spreading information on social media, coupled with a lack of stringent fact-checking measures, exacerbate the issue of misinformation. Our study examines how language features, especially Named Entity Recognition (NER) features, play a role in detecting fake news. We built two models: an AraBERT Multi-task Learning (MTL) based one for classifying Arabic fake news, and a token classification model that focuses on fake news NER features. The study combines embedding vectors from these models using an embedding fusion technique and applies machine learning algorithms for fake news detection in Arabic. We also introduced a feature selection algorithm named RLTTAO based on improving the Triangulation Topology Aggregation Optimizer (TTAO) performance using Reinforcement Learning and random opposition-based learning to enhance the performance by selecting relevant features, thereby improving the fusion process. Our results show that incorporating NER features enhances the accuracy of fake news detection in 5 out of 7 datasets, with an average improvement of 1.62%.References
Abdedaiem, S. (2023). Fake news detection in low resource languages using SetFit framework. Inteligencia Artificial, 26(72), 178–201. https://doi.org/10.4114/intartif.vol26iss72pp178-201
Abd Elaziz, M. (2023). A hybrid multitask learning framework with a fire hawk optimizer for Arabic fake news detection. Mathematics, 11(2), Article 258. https://doi.org/10.3390/math11020258
Abualigah, L. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers & Industrial Engineering, 157, Article 107250. https://doi.org/10.1016/j.cie.2021.107250
Abualigah, L. (2022). Reptile search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with Applications, 191, Article 116158. https://doi.org/10.1016/j.eswa.2021.116158
Adel, H. (2022). Improving crisis events detection using DistilBERT with hunger games search algorithm. Mathematics, 10(3), Article 447. https://doi.org/10.3390/math10030447
Ahmadian, M. (2023). Predicting crystallite size of Mg-Ti-SiC nanocomposites using an adaptive neuro-fuzzy inference system model modified by termite life cycle optimizer. Alexandria Engineering Journal, 84, 285–299. https://doi.org/10.1016/j.aej.2023.11.009
Alazab, M. (2022). Fake-news detection system using machine-learning algorithms for Arabic-language content. Journal of Theoretical and Applied Information Technology, 100(16), 5056–5068.
Alotaibi, M. (2022). Using a rule-based model to detect Arabic fake news propagation during Covid-19. International Journal of Advanced Computer Science and Applications, 13(8), 412–421.
Antoun, W., Baly, F., & Hajj, H. (2020). Arabert: Transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104.
Bensalem, I. (2024). Toxic language detection: A systematic review of Arabic datasets. Expert Systems, 41(2), Article e13551. https://doi.org/10.1111/exsy.13551
Chennafi, M. (2022). Arabic aspect-based sentiment classification using Seq2Seq dialect normalization and transformers. Knowledge, 2(3), 388–403. https://doi.org/10.3390/knowledge2030022
Cheng, N. (2021). Hacred: A large-scale relation extraction dataset toward hard cases in practical applications. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (pp. 2819–2829).
Choudhary, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Systems with Applications, 169, Article 114171. https://doi.org/10.1016/j.eswa.2020.114171
Clark, K., Luong, M. T., Le, Q. V., & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
Dahou, A. (2023). Optimizing fake news detection for Arabic context: A multitask learning approach with transformers and an enhanced nutcracker optimization algorithm. Knowledge-Based Systems, 280, Article 111023. https://doi.org/10.1016/j.knosys.2023.111023
de Souza, A. (2020). A systematic mapping on automatic classification of fake news in social media. Social Network Analysis and Mining, 10(1), Article 1. https://doi.org/10.1007/s13278-020-00659-2
Djouider, F. (2023). Experimental investigation and machine learning modeling using LSTM and special relativity search of friction stir processed AA2024/Al2O3 nanocomposites. Journal of Materials Research and Technology, 27, 7442–7455. https://doi.org/10.1016/j.jmrt.2023.11.155
Elaziz, M. A. (2022). Feature selection for high dimensional datasets based on quantum-based dwarf mongoose optimization. Mathematics, 10(23), Article 4565. https://doi.org/10.3390/math10234565
Fang, Y. (2024). NSEP: Early fake news detection via news semantic environment perception. Information Processing & Management, 61(1), Article 103594. https://doi.org/10.1016/j.ipm.2023.103594
Farhangian, M. (2024). Fake news detection: Taxonomy and comparative study. Information Fusion, 103, Article 102140. https://doi.org/10.1016/j.inffus.2023.102140
Faridmehr, I. (2023). Mountaineering team-based optimization: A novel human-based metaheuristic algorithm. Mathematics, 11(5), Article 1273. https://doi.org/10.3390/math11051273
Fouad, M. (2022). Arabic fake news detection using deep learning. Computers, Materials & Continua, 71(3), 4215–4231.
Garg, L. (2022). Linguistic features based framework for automatic fake news detection. Computers & Industrial Engineering, 172, Article 108432. https://doi.org/10.1016/j.cie.2022.108432
Guan, Z. (2023). Great wall construction algorithm: A novel meta-heuristic algorithm for engineer problems. Expert Systems with Applications, 233, Article 120905. https://doi.org/10.1016/j.eswa.2023.120905
Gu, Y., Qu, X., Wang, Z., Huai, B., Yuan, N. J., & Gui, X. (2021). Read, retrospect, select: An MRC framework to short text entity linking. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 14, pp. 12920–12928). https://doi.org/10.1609/aaai.v35i14.17528
Haouari, M., Ali, M., Hasanain, M., Suwaileh, R., & Elsayed, T. (2021). ArCOV19-rumors: A dataset of Arabic COVID-19 rumors and topic-associated tweets. arXiv preprint arXiv:2010.08768.
Himdi, N. (2022). Arabic fake news detection based on textual analysis. Arabian Journal for Science and Engineering, 47(8), 10453–10467. https://doi.org/10.1007/s13369-021-06449-y
Husain, F., & Al-Zaidy, R. (2021). Detection of fake news in Arabic social media: A survey. IEEE Access, 9, 121542–121556.
Ibrahim, R. A. (2018). Chaotic opposition-based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization. Expert Systems with Applications, 108, 1–19. https://doi.org/10.1016/j.eswa.2018.04.028
Jardaneh, G. (2019). Classifying Arabic tweets based on credibility using content and user features. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT) (pp. 596–601). IEEE.
Kaliyar, R. K. (2020). FNDNet–a deep convolutional neural network for fake news detection. Cognitive Systems Research, 61, 32–44. https://doi.org/10.1016/j.cogsys.2019.12.005
Kaliyar, R. K. (2021). Fakebert: Fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools and Applications, 80(8), 11765–11788. https://doi.org/10.1007/s11042-020-10183-2
Kudo, T., & Richardson, J. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
Kumar, S. (2023). OptNet-fake: Fake news detection in socio-cyber platforms using grasshopper optimization and deep neural network. IEEE Transactions on Computational Social Systems, 10(4), 1–12.
Luvembe, A. (2024). CAF-ODNN: Complementary attention fusion with optimized deep neural network for multimodal fake news detection. Information Processing & Management, 61(3), Article 103653. https://doi.org/10.1016/j.ipm.2024.103653
Mahlous, A. R. (2021). Fake news detection in Arabic tweets during the COVID-19 pandemic. International Journal of Advanced Computer Science and Applications, 12(5), 778–786.
Mehta, V. (2021). A transformer-based architecture for fake news classification. Social Network Analysis and Mining, 11(1), Article 1.
Nassif, A. B. (2022). Arabic fake news detection based on deep contextualized embedding models. Neural Computing and Applications, 34(18), 16019–16032. https://doi.org/10.1007/s00521-022-07206-4
Ozbay, F. (2021). Adaptive salp swarm optimization algorithms with inertia weights for novel fake news detection model in online social media. Multimedia Tools and Applications, 80(25), 34333–34351. https://doi.org/10.1007/s11042-021-11006-8
Qu, B. (2023). A survey on Arabic named entity recognition: Past, recent advances, and future trends. IEEE Transactions on Knowledge and Data Engineering, 35(6), 1–18.
Seddari, N. (2022). A hybrid linguistic and knowledge-based analysis approach for fake news detection on social media. IEEE Access, 10, 63241–63255. https://doi.org/10.1109/ACCESS.2022.3181184
Sorour, S. (2022). AFND: Arabic fake news detection with an ensemble deep CNN-LSTM model. Journal of Theoretical and Applied Information Technology, 100(17), 5072–5084.
Spalenza, M. (2021). LCAD-UFES at FakeDeS 2021: Fake news detection using named entity recognition and part-of-speech sequences. In Proceedings of the IberLEF 2021 Workshop (pp. 646–652).
Tanabe, M., & Fukunaga, A. S. (2013). Improving the search performance of SHADE using linear population size reduction. In 2013 IEEE Congress on Evolutionary Computation (pp. 1658–1665). IEEE.
Tizhoosh, H. R. (2005). Opposition-based learning: A new scheme for machine intelligence. In International Conference on Computational Intelligence for Modelling, Control and Automation (pp. 695–701). IEEE.
Ugawa, A., Tamura, A., Ninomiya, T., Takamura, H., & Okumura, M. (2018). Neural machine translation incorporating named entity. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 3240–3250).
Yildirim, S. (2023). A novel hybrid multi-thread metaheuristic approach for fake news detection in social media. Applied Intelligence, 53(10), 11182–11199. https://doi.org/10.1007/s10489-022-03972-9
Zhao, W. (2023). Triangulation topology aggregation optimizer: A novel mathematics-based meta-heuristic algorithm for continuous optimization and engineering applications. Expert Systems with Applications, 238, Article 121744. https://doi.org/10.1016/j.eswa.2023.121744
Zhu, Y., & Alam, F. (2021). Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. arXiv preprint arXiv:2104.08741.