Q-GEV Based Novel Trainable Clustering Scheme for Reducing Complexity of Data Clustering
DOI:
https://doi.org/10.1111/exsy.70011الكلمات المفتاحية:
artificial intelligence; continual learning; data clustering; density peak clustering; generalised extreme value; learning model; machine learningالملخص
This paper presents a new data clustering technique aimed at enhancing the performance of the trainable path-cost algorithm and reducing the computational complexity of data clustering models. The proposed method facilitates the discovery of natural groupings and behaviours, which is crucial for effective coordination in complex environments. It identifies natural groupings within a set of features and detects the best clusters with similar behaviour in the data, overcoming the limitations of traditional state-of-the-art methods. The algorithm utilises a density peak clustering method to determine cluster centers and then extracts features from paths passing through these peak points (centers). These features are used to train the support vector machine (SVM) to predict the labels of other points. The proposed algorithm is enhanced using two key concepts: first, it employs Q-Generalised Extreme Value (Q-GEV) under power normalisation instead of traditional generalised extreme value distributions, thereby increasing modelling flexibility; second, it utilises the random vector functional link (RVFL) network rather than the SVM, which helps avoid overfitting and improves label prediction accuracy. The effectiveness of the proposed clustering algorithm is evaluated through various experiments, including those on UCI benchmark datasets and real-world data, demonstrating significant improvements across multiple performance metrics, including F1 measure, Jaccard index, purity, and accuracy, highlighting its capability in accurately identifying paths between similar clusters. Its average F1 measure, Jaccard index, purity, and accuracy is measured 76.87%, 56.29%, 80.29%, and 79.64%, respectively.
المراجع
Abd Elaziz, M., Nabil, N., Ewees, A. A., & Lu, S. (2019). Automatic data clustering based on hybrid atom search optimization and sine-cosine algorithm. In 2019 IEEE Congress on Evolutionary Computation (CEC) (pp. 2315–2322). IEEE.
Ali, A., Ahmed, M. E., Ali, F., Tran, N. H., Niyato, D., & Pack, S. (2019). Non-parametric Bayesian channels clustering (Nobel) scheme for wireless multimedia cognitive radio networks. IEEE Journal on Selected Areas in Communications, 37(10), 2293–2305.
Chen, X., Qi, J., Zhu, X., Wang, X., & Zha, Z. (2020a). Unlabelled text mining methods based on two extension models of concept lattices. International Journal of Machine Learning and Cybernetics, 11(2), 475–490.
Chen, Y., Hu, X., Fan, W., et al. (2020b). Fast density peak clustering for large scale data based on Knn. Knowledge-Based Systems, 187, Article 104824.
De Sole, A., & Kac, V. (2003). On integral representations of q-gamma and q-beta functions. arXiv preprint math/0302032.
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1), 269–271.
Ding, J., He, X., Yuan, J., & Jiang, B. (2018). Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Computing, 22(9), 2777–2796.
Du, M., Ding, S., & Jia, H. (2016). Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems, 99, 135–145.
Evers, F. T., Höppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis: Methods for classification, data analysis and image recognition. John Wiley & Sons.
Fischer, B., & Buhmann, J. M. (2003). Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4), 513–518.
Goldberger, A., Amaral, L., Glass, L., et al. (2000). Physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215–e220.
Graña, M., Nanni, L., Brahnam, S., & Menegatti, E. (2015). Texture descriptors based on Dijkstra's algorithm for medical image analysis. Innovation Medicine Healthcare, 207, 74.
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.
Jiang, J., Chen, Y., Meng, X., Wang, L., & Li, K. (2019). A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Physica A: Statistical Mechanics and its Applications, 523, 702–713.
Karim, M. R., Beyan, O., Zappa, A., et al. (2021). Deep learning-based clustering approaches for bioinformatics. Briefings in Bioinformatics, 22(1), 393–415.
Maška, M., Ulman, V., Svoboda, D., et al. (2014). A benchmark for comparison of cell tracking algorithms. Bioinformatics, 30(11), 1609–1617.
Pao, Y.-H., & Takefuji, Y. (1992). Functional-link net computing: Theory, system architecture, and functionalities. Computer, 25(5), 76–79.
Pizzagalli, D. U., Gonzalez, S. F., & Krause, R. (2019). A trainable clustering algorithm based on shortest paths from density peaks. Science Advances, 5(10), eaax3770.
Provost, S. B., Saboor, A., Cordeiro, G. M., & Mansoor, M. (2018). On the q-generalized extreme value distribution. REVSTAT-Statistical Journal, 16(1), 45–70.
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492–1496.
Saxena, A., Prasad, M., Gupta, A., et al. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664–681.
Siddiqi, M. H., K. Asghar, U. Draz, et al. (2021). Image splicing-based forgery detection using discrete wavelet transform and edge weighted local binary patterns. Security and Communication Networks, 2021(1), Article 4270776.
Ulman, V., Maška, M., Magnusson, K. E., et al. (2017). An objective comparison of cell-tracking algorithms. Nature Methods, 14(12), 1141–1152.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.
Xu, X., Ding, S., & Shi, Z. (2018). An improved density peaks clustering algorithm with fast finding cluster centers. Knowledge-Based Systems, 158, 65–74.
Xu, X., Ding, S., Wang, L., & Y. Wang. (2020). A robust density peaks clustering algorithm with density-sensitive similarity. Knowledge-Based Systems, 200, Article 106028.
Zhou, Z., Si, G., Sun, H., Qu, K., & Hou, W. (2022). A robust clustering algorithm based on the identification of core points and Knn kernel density estimation. Expert Systems with Applications, 195, Article 116573.
التنزيلات
منشور
الرخصة
الحقوق الفكرية (c) 2026 "This Open Access article is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0), permitting unrestricted use, distribution, and adaptation provided the original author and source are properly credited."

هذا العمل مرخص بموجب Creative Commons Attribution 4.0 International License.