Optimizing Heart Disease Prediction Models through SMOTE: Addressing Data Imbalance

Authors

  • Waheeb Baddah Azal University for Human Development International University of Technology Twintech Sana’a, Yemen Author
  • Hamzah Ali Qasem International University of Technology Twintech, 21 September University of Medical and Applied Sciences, Sana’a, Yemen Author
  • Ayman Alsabry Department of Computer Science, International University of Technology Twintech, Sana’a, Yemen Author
  • Rana Saleh Al Gawani Lebanese International University, Sana’a, Yemen Author
  • Wafa Mohammed Alzuraiqi Department of Computer Science, International University of Technology Twintech, Sana’a, Yemen Author
  • F. E. Hanash Emirates International University image/svg+xml Author

DOI:

https://doi.org/10.1109/eSmarTA62850.2024.10638899

Keywords:

Heart , Measurement , Accuracy , Machine learning , Predictive models , Data collection , Data models

Abstract

The problem of data imbalance poses a significant challenge in the field of medical diagnostics, particularly in heart disease prediction using machine learning models. This study investigates the application of the Synthetic Minority Over-sampling Technique (SMOTE) to address this imbalance and improve the predictive accuracy of heart disease models. Through a rigorous methodology involving data collection, preprocessing, and the evaluation of 21 machine learning models, the study compares the performance of models trained on both original and SMOTE-balanced datasets. The findings indicate that models trained on SMOTE-balanced datasets showed statistically significant improvements in precision, F1-score, and often recall, with varied impacts on accuracy depending on the dataset. Key performance metrics such as accuracy, precision, recall, and F1-score are analyzed using t-tests to assess the statistical significance of improvements offered by SMOTE. Notably, the enhancements were particularly evident in precision and F1-score across the Cleveland and Statlog heart disease datasets, demonstrating SMOTE's ability to enhance model performance effectively. This study aims to demonstrate the potential of SMOTE in enhancing model performance, thus contributing to more effective and reliable heart disease diagnostics.

References

1. W. Baddah, H. A. Qasem, A. Alsabry, A. abdo Mohammed, and F. Hanash, "Predicting Heart Disease Using Machine Learning Techniques on Electronic Health Records Data," in 2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA), pp. 1-8, 2023.

2. R. Katarya and S. K. Meena, "Machine learning techniques for heart disease prediction: a comparative study and analysis," Health and Technology, vol. 11, pp. 87-97, 2021.

3. A. L. Yadav, K. Soni, and S. Khare, "Heart Diseases Prediction using Machine Learning," in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1-7, 2023.

4. P. C. Bizimana, Z. Zhang, M. Asim, and A. A. Abd El-Latif, "An effective machine learning-based model for an early heart disease prediction," BioMed Research International, vol. 2023, pp. 1-11, 2023.

5. A. Alsabry and M. Algabri, "Iterative tuning of tree-ensemble-based models' parameters using Bayesian optimization for breast cancer prediction," Информатика и автоматизация, vol. 23, pp. 129-168, 2024.

6. A. Alsabry, M. Algabri, A. M. Ahsan, M. A. Mosleh, A. A. Ahmed, and H. A. Qasem, "Enhancing Prediction Models' Performance for Breast Cancer using SMOTE Technique," in 2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA), pp. 1-8, 2023.

7. A. Alsabry, M. Algabri, A. M. Ahsan, M. A. Mosleh, A. A. Ahmed, and H. A. Qasem, "Breast Cancer Prediction Framework Based on Iterative Optimization with Bayesian Hyperparameter Tuning," in 2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA), pp. 1-8, 2023.

8. M. A. Sahid, M. Hasan, N. Akter, and M. M. R. Tareq, "Effect of imbalance data handling techniques to improve the accuracy of heart disease prediction using machine learning and deep learning," in 2022 IEEE Region 10 Symposium (TENSYMP), pp. 1-6, 2022.

9. M. Wang, X. Yao, and Y. Chen, "An imbalanced-data processing algorithm for the prediction of heart attack in stroke patients," IEEE Access, vol. 9, pp. 25394-25404, 2021.

10. X. Zheng, "SMOTE variants for imbalanced binary classification: heart disease prediction," University of California, Los Angeles, 2020.

11. A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, "SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary," Journal of artificial intelligence research, vol. 61, pp. 863-905, 2018.

12. Y. Bao and S. Yang, "Two novel SMOTE methods for solving imbalanced classification problems," IEEE Access, vol. 11, pp. 5816-5823, 2023.

13. H. Guan, L. Zhao, X. Dong, and C. Chen, "Extended natural neighborhood for SMOTE and its variants in imbalanced classification," Engineering Applications of Artificial Intelligence, vol. 124, p. 106570, 2023.

14. G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya, and M. Ismail, "SMOTE for handling imbalanced data problem: A review," in 2021 sixth international conference on informatics and computing (ICIC), pp. 1-8, 2021.

15. N. Chandrasekhar and S. Peddakrishna, "Enhancing heart disease prediction accuracy through machine learning techniques and optimization," Processes, vol. 11, p. 1210, 2023.

16. Z. C. Oleiwi, E. N. AlShemmary, and S. Al-augby, "Adaptive Features Selection Technique for Efficient Heart Disease Prediction," Journal of Al-Qadisiyah for computer science and mathematics, vol. 15, pp. 1-18, 2023.

17. R. R. Sarra, A. M. Dinar, M. A. Mohammed, and K. H. Abdulkareem, "Enhanced heart disease prediction based on machine learning and χ2 statistical optimal feature selection model," Designs, vol. 6, p. 87, 2022.

18. A. K. Dubey, A. K. Sinhal, and R. Sharma, "An improved auto categorical PSO with ML for heart disease prediction," Engineering, Technology & Applied Science Research, vol. 12, pp. 8567-8573, 2022.

19. S. I. Ayon, M. M. Islam, and M. R. Hossain, "Coronary artery heart disease prediction: a comparative study of computational intelligence techniques," IETE Journal of Research, vol. 68, pp. 2488-2507, 2022.

20. A. Abdellatif, H. Abdellatef, J. Kanesan, C.-O. Chow, J. H. Chuah, and H. M. Gheni, "An effective heart disease detection and severity level classification model using machine learning and hyperparameter optimization methods," ieee access, vol. 10, pp. 79974-79985, 2022.

21. S. Kannan, "Modelling an efficient clinical decision support system for heart disease prediction using learning and optimization approaches," CMES-Computer Modeling in Engineering & Sciences, vol. 131, 2022.

22. P. Rajendran, S.-C. Haw, and P. Naveen, "Classification of Heart Disease Using Machine Learning Techniques," in Proceedings of the 5th International Conference on Digital Technology in Education, pp. 130-135, 2021.

23. K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, "Heart disease risk prediction using machine learning classifiers with attribute evaluators," Applied Sciences, vol. 11, p. 8352, 2021.

24. S. Sajeev, A. Maeder, S. Champion, A. Beleigoli, C. Ton, X. Kong, et al., "Deep learning to improve heart disease risk prediction," in Machine Learning and Medical Engineering for Cardiovascular Health and Intravascular Imaging and Computer Assisted Stenting: First International Workshop, MLMECH 2019, and 8th Joint International Workshop, CVII-STENT 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 1, pp. 96-103, 2019.

25. "Kaggle Repository, Heart Disease Cleveland." [Online]. Available: https://www.kaggle.com/datasets/ritwikb3/heart-disease-cleveland

26. "Kaggle Repository, Heart Disease Statlog." [Online]. Available: https://www.kaggle.com/datasets/ritwikb3/heart-disease-statlog

18

Downloads

Published

2024-08-01

Repository

Section

Articles

Categories

How to Cite

Baddah, W., Qasem, H. A., Alsabry, A., Al Gawani, R. S., Alzuraiqi, W. M., & Hanash, F. E. (2024). Optimizing Heart Disease Prediction Models through SMOTE: Addressing Data Imbalance. Emirates International University Digital Repository, 1(1). https://doi.org/10.1109/eSmarTA62850.2024.10638899

Similar Articles

21-30 of 51

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)