An Optimized Framework Based on Data Exploration and Dynamic Ensemble-Based Models for Breast Cancer Prediction
DOI:
https://doi.org/10.47839/ijc.23.2.3544Keywords:
Data Exploration , Ensemble Classifier , Hyperparameters Tuning , Machine LearningAbstract
Breast cancer (BC) is a major global health concern. Detecting BC at an early stage gives more treatment options and can help avoid more aggressive treatments. The use of machine learning (ML) in BC prediction offers significant potential for improving the accuracy and speed of diagnosis, personalizing treatment, and identifying high-risk patients. However, there are significant challenges associated with the use of ML, including the need for high-quality data and more flexible models with optimal parameters to achieve high efficiency. In this paper, we propose an optimized framework based on multi-stage data exploration. This framework is designed to provide a comprehensive approach to data exploration, ensuring that the data is well-prepared for ML. In addition, the framework includes dynamic ensemble-based classifiers, which combine multiple independent classifiers to improve accuracy and mitigate the risk of overfitting in conjunction with the cross-validation techniques. These classifiers are optimized using Bayesian hyperparameter tuning, which involves selecting the optimal values for the various hyperparameters of the model. This approach can significantly improve the prediction accuracy of the resulting model. The study evaluates the framework using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) dataset and compares our results with other state-of-the-art models. The experimental results show that the best result is 100% for accuracy and recall with hyperparameters of (Ensemble Method = AdaBoost, Number of learners = 322, learning rate = 0.9350, and the Maximum number of splits = 1). The highest performance has been achieved with the proposed framework compared with the other models in terms of accuracy (mean = 99.35%, best = 100%, worst = 98.7%, and Standard Deviation = 0.325). The framework can potentially improve the accuracy and efficiency of BC prediction, ultimately leading to better outcomes for patients.
References
A
Aalaei, S., Shahraki, H., Rowhanimanesh, A., & Eslami, S. (2016). Feature selection using genetic algorithm for breast cancer diagnosis: Experiment on three different datasets. Iranian Journal of Basic Medical Sciences, 19(5), 476–482.
Aquino, M., & Rosner, G. (2019). Systemic contact dermatitis. Clinical Reviews in Allergy & Immunology, 56(1), 9–18. https://doi.org/10.1007/s12016-018-8686-z
Ara, S., Das, A., & Dey, A. (2021). Malignant and benign breast cancer classification using machine learning algorithms. Proceedings of the 2021 International Conference on Artificial Intelligence (ICAI), 97–101. https://doi.org/10.1109/ICAI52203.2021.9445249
Assegie, T. A., Tulasi, R. L., & Kumar, N. K. (2021). Breast cancer prediction model with decision tree and adaptive boosting. IAES International Journal of Artificial Intelligence, 10(1), 184–190. https://doi.org/10.11591/ijai.v10.i1.pp184-190
B
Bacha, S., & Taouali, O. (2022). A novel machine learning approach for breast cancer diagnosis. Measurement, 187, Article 110233. https://doi.org/10.1016/j.measurement.2021.110233
Beno, M. M., Valarmathi, I. R., Swamy, S. M., & Rajakumar, B. (2014). Threshold prediction for segmenting tumour from brain MRI scans. International Journal of Imaging Systems and Technology, 24(2), 129–137. https://doi.org/10.1002/ima.22087
Bhardwaj, A., Bhardwaj, H., Sakalle, A., Uddin, Z., Sakalle, M., & Ibrahim, W. (2022). Tree-based and machine learning algorithm analysis for breast cancer classification. Computational Intelligence and Neuroscience, 2022, Article 6715406. https://doi.org/10.1155/2022/6715406
C
Cardoso, F., Kyriakides, S., Ohno, S., Penault-Llorca, F., Poortmans, P., Rubio, I., Zackrisson, S., & Senkus, E. (2019). Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Annals of Oncology, 30(8), 1194–1220. https://doi.org/10.1093/annonc/mdz173
Cava, E., Marzullo, P., Farinelli, D., Gennari, A., Saggia, C., Riso, S., & Prodam, F. (2022). Breast cancer diet “BCD”: A review of healthy dietary patterns to prevent breast cancer recurrence and reduce mortality. Nutrients, 14(3), Article 476. https://doi.org/10.3390/nu14030476
Chen, M., Hao, Y., Hwang, K., Wang, L., & Wang, L. (2017). Disease prediction by machine learning over big data from healthcare communities. IEEE Access, 5, 8869–8879. https://doi.org/10.1109/ACCESS.2017.2694446
Christo, V. E., Nehemiah, H. K., Brighty, J., & Kannan, A. (2022). Feature selection and instance selection from clinical datasets using co-operative co-evolution and classification using random forest. IETE Journal of Research, 68(4), 2508–2521. https://doi.org/10.1080/03772063.2020.1713917
Chugh, G., Kumar, S., & Singh, N. (2021). Survey on machine learning and deep learning applications in breast cancer diagnosis. Cognitive Computation, 13(6), 1451–1470. https://doi.org/10.1007/s12559-020-09813-6
F
Fina, E., Reduzzi, C., Motta, R., Di Cosimo, P., Martinetti, A., Celio, L., de Braud, F., & Cappelletti, M. R. (2022). Signatures of breast cancer progression in the blood: What could be learned from circulating tumor cell transcriptomes. Cancers, 14(22), Article 5668. https://doi.org/10.3390/cancers14225668
G
Ghojogh, B., & Crowley, M. (2019). The theory behind overfitting, cross validation, regularization, bagging, and boosting: Tutorial. arXiv preprint arXiv:1905.12787. https://doi.org/10.48550/arXiv.1905.12787
Gopal, V. N., Al-Turjman, F., Kumar, R., Anand, L., & Rajesh, M. (2021). Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement, 178, Article 109442. https://doi.org/10.1016/j.measurement.2021.109442
H
Hemavathi, N., Sriranjani, R., Arulmozhi, P., Meenalochani, M., & Deepak, R. (2022). Deep learning based early prediction scheme for breast cancer. Wireless Personal Communications, 122(1), 931–946. https://doi.org/10.1007/s11277-021-08933-y
Hurson, A. N., Ahearn, T. U., Keeman, R., Abubakar, M., Jung, A. Y., Kapoor, P. M., Mavaddat, N., Choudhury, P. P., & Garcia-Closas, M. (2022). Systematic literature review of risk factor associations with breast cancer subtypes in women of African, Asian, Hispanic, and European descents. Cancer Research, 82(12_Supplement), 3670–3670. https://doi.org/10.1158/1538-7445.AM2022-3670
J
Jeyasingh, S., & Veluchamy, M. (2017). Modified bat algorithm for feature selection with the Wisconsin diagnosis breast cancer (WDBC) dataset. Asian Pacific Journal of Cancer Prevention, 18(5), 1257–1264. https://doi.org/10.22034/APJCP.2017.18.5.1257
K
Krithiga, R., & Geetha, P. (2021). Breast cancer detection, segmentation and classification on histopathology images analysis: A systematic review. Archives of Computational Methods in Engineering, 28(4), 2607–2619. https://doi.org/10.1007/s11831-020-09470-w
L
Lahoura, V., Singh, H., Aggarwal, A., Sharma, B., Mohammed, M. A., Damaševičius, R., Cengiz, K., & Kadry, S. (2021). Cloud computing-based framework for breast cancer diagnosis using extreme learning machine. Diagnostics, 11(2), Article 241. https://doi.org/10.3390/diagnostics11020241
Latchoumi, T., Ezhilarasi, T., & Balamurugan, K. (2019). Bio-inspired weighed quantum particle swarm optimization and smooth support vector machine ensembles for identification of abnormalities in medical data. SN Applied Sciences, 1(10), 1–10. https://doi.org/10.1007/s42452-019-1179-8
Liu, N., & Wang, H. (2010). Ensemble based extreme learning machine. IEEE Signal Processing Letters, 17(8), 754–757. https://doi.org/10.1109/LSP.2010.2053356
M
Madani, M., Behzadi, M. M., & Nabavi, S. (2022). The role of deep learning in advancing breast cancer detection using different imaging modalities: A systematic review. Cancers, 14(21), Article 5334. https://doi.org/10.3390/cancers14215334
Mamat, W. H. W., Jarrett, N., & Lund, S. (2022). Diagnostic interval: Experiences among women with breast cancer in Malaysia. Open Access Macedonian Journal of Medical Sciences, 9(G), 54–59. https://doi.org/10.3889/oamjms.2021.7833
Mangasarian, O. L., Street, W. N., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570–577. https://doi.org/10.1287/opre.43.4.570
Monirujjaman Khan, M., Islam, S., Sarkar, S., Ayaz, F. I., Ananda, M. K., Tazin, T., Aljuaid, H., & Khan, F. (2022). Machine learning based comparative analysis for breast cancer prediction. Journal of Healthcare Engineering, 2022, Article 4365855. https://doi.org/10.1155/2022/4365855
N
Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, 487–492. https://doi.org/10.1016/j.procs.2021.07.062
O
Ogundokun, R. O., Misra, S., Douglas, M., Damaševičius, R., & Maskeliūnas, R. (2022). Medical internet-of-things based breast cancer diagnosis using hyperparameter-optimized neural networks. Future Internet, 14(5), Article 153. https://doi.org/10.3390/fi14050153
P
Papel, K. (1995). Breast Cancer Wisconsin (Diagnostic) Dataset. Kaggle Repository. https://www.kaggle.com/code/karan1210/breast-cancer/data
Parmar, A., Katariya, R., & Patel, V. (2019). A review on random forest: An ensemble classifier. Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI 2018), 26, 758–763. https://doi.org/10.1007/978-3-030-03146-6_86
R
Rasool, A., Bunterngchit, C., Tiejian, L., Islam, M. R., Qu, Q., & Jiang, Q. (2022). Improved machine learning-based predictive models for breast cancer diagnosis. International Journal of Environmental Research and Public Health, 19(6), Article 3211. https://doi.org/10.3390/ijerph19063211
S
Saleh, H., Alyami, H., & Alosaimi, W. (2022). Predicting breast cancer based on optimized deep learning approach. Computational Intelligence and Neuroscience, 2022, Article 1820777. https://doi.org/10.1155/2022/1820777
Samieinasab, M., Torabzadeh, A. A., Behnam, A., Aghsami, A., & Jolai, F. (2022). Meta-health stack: A new approach for breast cancer prediction. Healthcare Analytics, 2, Article 100010. https://doi.org/10.1016/j.health.2021.100010
Sengupta, S., & Das, A. (2017). Particle swarm optimization based incremental classifier design for rice disease prediction. Computers and Electronics in Agriculture, 140, 443–451. https://doi.org/10.1016/j.compag.2017.06.024
Sheth, P. D., Patil, S. T., & Dhore, M. L. (2020). Evolutionary computing for clinical dataset classification using a novel feature selection algorithm. Journal of King Saud University – Computer and Information Sciences, 32(4), 5075–5082. https://doi.org/10.1016/j.jksuci.2020.12.012
Showrov, M. I. H., Islam, M. T., Hossain, M. D., & Ahmed, S. S. (2019). Performance comparison of three classifiers for the classification of breast cancer dataset. Proceedings of the 2019 4th International Conference on Electrical Information and Communication Technology (EICT 2019), 1–5. https://doi.org/10.1109/EICT48899.2019.9068816
W
Wu, H. .-J., & Chu, P. .-Y. (2022). Current and developing liquid biopsy techniques for breast cancer. Cancers, 14(9), Article 2052. https://doi.org/10.3390/cancers14092052
Y
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Ying, X. (2019). An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168(2), Article 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
Younis, Y. S., Ali, A. H., Alhafidhb, O. K., Yahia, W. B., Alazzam, M. B., & Hamad, A. A. (2022). Early diagnosis of breast cancer using image processing techniques. Journal of Nanomaterials, 2022, Article 2641239. https://doi.org/10.1155/2022/2641239
Yue, W., Wang, Z., Chen, H., Payne, A., & Liu, X. (2018). Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2(2), Article 13. https://doi.org/10.3390/designs2020013
Z
Zebari, D. A., Ibrahim, D. A., Zeebaree, D. Q., Haron, H., Salih, M. S., & Damaševičius, R. (2021). Systematic review of computing approaches for breast cancer detection based computer aided diagnosis using mammogram images. Applied Artificial Intelligence, 35(15), 2157–2203. https://doi.org/10.1080/08839514.2021.2001177