Yazar "Senol, Ali" seçeneğine göre listele
Listeleniyor 1 - 7 / 7
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe A comparison of tree data structures in the streaming data clustering issue(Gazi Univ, Fac Engineering Architecture, 2024) Senol, Ali; Kaya, Mahmut; Canbay, YavuzProcessing streaming data is a challenging issue because of the limitation of time and resources. Clustering data streams is an efficient technique to analyze this kind of data. This study proposes two new streaming data clustering algorithms, BT-AR Stream and VP-AR Stream, inspired by the KD-AR Stream clustering algorithm [32]. Our algorithms used Ball-Tree and Vintage Tree data structures instead of KD-Tree. To reveal the efficiency of the proposed algorithms, we tested the algorithms on 18 benchmark datasets in terms of clustering qualities and runtime complexities. Then we compared obtained results with the results of the KD-AR Stream algorithm. According to the results, the BT-AR Stream algorithm was the most successful in terms of clustering quality and runtime complexity, as illustrated in Figure A.Purpose: This study aims to analyze and compare the efficiency of tree data structure in data stream clustering issues. We aim to reveal the efficiency of tree data structures in both clustering quality and runtime performance.Theory and Methods: To compare the efficiency of tree data structures in data stream clustering, we proposed two stream clustering algorithms inspired by KD-AR Stream. For this reason, we used Ball-Tree and Vintage-Tree data structures instead of KD-Tree and proposed two new stream clustering algorithms named BT-AR Stream and VP-AR Stream. To compare the success of algorithms, we tested them on 18 benchmark datasets and compared them in aspects of clustering quality and runtime complexity.Results: According to the results obtained in the experimental study, the BT-AR Stream algorithm, which uses Ball-Tree, was the most successful in both clustering quality and runtime complexity on the KDD, which is a high-dimensional dataset. On the other hand, the clustering quality of all algorithms was good on the other datasets. Conclusion: Although the clustering quality of all three algorithms was good, the BT-AR Stream algorithm was the most successful because KDD is high-dimensional. Furthermore, it is the fastest algorithm compared to the others.Öğe A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: a case study on sleep disorder diagnosis(Springer London Ltd, 2024) Senol, Ali; Talan, Tarik; Akturk, CemalIn the machine learning area, having a large number of irrelevant or less relevant features to the result of the dataset can reduce classification success and run-time performance. For this reason, feature selection or reduction methods are widely used. The aim is to eliminate irrelevant features or transform the features into new features that have fewer numbers and are relevant to the results. However, in some cases, feature reduction methods are not sufficient on their own to increase success. In this study, we propose a new hybrid feature projection model to increase the classification performance of classifiers. For this goal, the MCMSTClustering algorithm is used in the data preprocessing stage of classification with various feature projection methods, which are PCA, LDA, SVD, t-SNE, NCA, Isomap, and PR, to increase the classification performance of the sleep disorder diagnosis. To determine the best parameters of the MCMSTClustering algorithm, we used the VIASCKDE Index, Dunn Index, Silhouette Index, Adjusted Rand Index, and Accuracy as cluster quality evaluation methods. To evaluate the performance of the proposed model, we first appended class labels produced by the MCMSTClustering to the dataset as a new feature. We applied selected feature projection methods to decrease the number of features. Then, we performed the kNN algorithm on the dataset. Finally, we compared the obtained results. To reveal the efficiency of the proposed model, we tested it on a sleep disorder diagnosis dataset and compared it with two models that were pure kNN and kNN with the feature projection methods used in the proposed approach. According to the experimental results, the proposed method, in which the feature projection method was Kernel PCA, was the best model with a classification accuracy of 0.9627. In addition, the MCMSTClustering algorithm increases the performance of PCA, Kernel PCA, SVD, t-SNE, and PR. However, the performance of the LDA, NCA, and Isomap remains the same.Öğe ANDClust: An Adaptive Neighborhood Distance-Based Clustering Algorithm to Cluster Varying Density and/or Neck-Typed Datasets(Wiley-V C H Verlag Gmbh, 2024) Senol, AliAlthough density-based clustering algorithms can successfully define clusters in arbitrary shapes, they encounter issues if the dataset has varying densities or neck-typed clusters due to the requirement for precise distance parameters, such as eps parameter of DBSCAN. These approches assume that data density is homogenous, but this is rarely the case in practice. In this study, a new clustering algorithm named ANDClust (Adaptive Neighborhood Distance-based Clustering Algorithm) is propoesed to handle datasets with varying density and/or neck-typed clusters. The algorithm consists of three parts. The first part uses Multivariate Kernel Density Estimation (MulKDE) to find the dataset's peak points, which are the start points for the Minimum Spanning Tree (MST) to construct clusters in the second part. Lastly, an Adaptive Neighborhood Distance (AND) ratio is used to weigh the distance between the data pairs. This method enables this approach to support inter-cluster and intra-cluster density varieties by acting as if the distance parameter differs for each data of the dataset. ANDClust on synthetic and real datasets are tested to reveal its efficiency. The algorithm shows superior clustering quality in a good run-time compared to its competitors. Moreover, ANDClust could effectively define clusters of arbitrary shapes and process high-dimensional, imbalanced datasets may have outliers. This study proposes a new clustering algorithm named ANDClust to handle datasets with varying density and neck-typed clusters. In the proposed algorithm, an Adaptive Neighborhood Distance (AND) ratio is used to weigh the distance between the data pairs as if it differs for each data pair in the dataset. This method makes the approach support not only the varying density among clusters but also the varying density inside the cluster. imageÖğe ImpKmeans: An Improved Version of the KMeans Algorithm, by Determining Optimum Initial Centroids, based on Multivariate Kernel Density Estimation and Kd-Tree(Budapest Tech, 2024) Senol, AliK -means is the best known clustering algorithm, because of its usage simplicity, fast speed and efficiency. However, resultant clusters are influenced by the randomly selected initial centroids. Therefore, many techniques have been implemented to solve the mentioned issue. In this paper, a new version of the k -means clustering algorithm named as ImpKmeans shortly (An Improved Version of K -Means Algorithm by Determining Optimum Initial Centroids Based on Multivariate Kernel Density Estimation and Kd-tree) that uses kernel density estimation, to find the optimum initial centroids, is proposed. Kernel density estimation is used, because it is a nonparametric distribution estimation method, that can identify density regions. To understand the efficiency of the ImpKmeans, we compared it with some state-of-the-art algorithms. According to the experimental studies, the proposed algorithm was better than the compared versions of k -means. While ImpKmeans was the most successful algorithm in 46 tests of 60, the second-best algorithm, was the best on 34 tests. Moreover, experimental results indicated that the ImpKmeans is fast, compared to the selected k -means versions.Öğe MCMSTClustering: defining non-spherical clusters by using minimum spanning tree over KD-tree-based micro-clusters(Springer London Ltd, 2023) Senol, AliClustering is a technique for statistical data analysis and is widely used in many areas where class labels are not available. Major problems related to clustering algorithms are handling high-dimensional, imbalanced, and/or varying-density datasets, detecting outliers, and defining arbitrary-shaped clusters. In this study, we proposed a novel clustering algorithm named as MCMSTClustering (Defining Non-Spherical Clusters by using Minimum Spanning Tree over KD-Tree-based Micro-Clusters) to overcome mentioned issues simultaneously. Our algorithm consists of three parts. The first part is defining micro-clusters using the KD-Tree data structure with range search. The second part is constructing macro-clusters by using minimum spanning tree (MST) on defined micro-clusters, and the final part is regulating defined clusters to increase the accuracy of the algorithm. To state the efficiency of our algorithm, we performed some experimental studies on some state-of-the-art algorithms. The findings were presented in detail with tables and graphs. The success of the proposed algorithm using various performance evaluation criteria was confirmed. According to the experimental studies, MCMSTClustering outperformed competitor algorithms in aspects of clustering quality in acceptable run-time. Besides, the obtained results showed that the novel algorithm can be applied effectively in solving many different clustering problems in the literature.Öğe MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data(Springer London Ltd, 2024) Erdinc, Berfin; Kaya, Mahmut; Senol, AliStream clustering has emerged as a vital area for processing streaming data in real-time, facilitating the extraction of meaningful information. While efficient approaches for defining and updating clusters based on similarity criteria have been proposed, outliers and noisy data within stream clustering areas pose a significant threat to the overall performance of clustering algorithms. Moreover, the limitation of existing methods in generating non-spherical clusters underscores the need for improved clustering quality. As a new methodology, we propose a new stream clustering approach, MCMSTStream, to overcome the abovementioned challenges. The algorithm applies MST to micro-clusters defined by using the KD-Tree data structure to define macro-clusters. MCMSTStream is robust against outliers and noisy data and has the ability to define clusters with arbitrary shapes. Furthermore, the proposed algorithm exhibits notable speed and can handling high-dimensional data. ARI and Purity indices are used to prove the clustering success of the MCMSTStream. The evaluation results reveal the superior performance of MCMSTStream compared to state-of-the-art stream clustering algorithms such as DenStream, DBSTREAM, and KD-AR Stream. The proposed method obtained a Purity value of 0.9780 and an ARI value of 0.7509, the highest scores for the KDD dataset. In the other 11 datasets, it obtained much higher results than its competitors. As a result, the proposed method is an effective stream clustering algorithm on datasets with outliers, high-dimensional, and arbitrary-shaped clusters. In addition, its runtime performance is also quite reasonable.Öğe VIASCKDE Index: A Novel Internal Cluster Validity Index for Arbitrary-Shaped Clusters Based on the Kernel Density Estimation(Hindawi Ltd, 2022) Senol, AliThe cluster evaluation process is of great importance in areas of machine learning and data mining. Evaluating the clustering quality of clusters shows how much any proposed approach or algorithm is competent. Nevertheless, evaluating the quality of any cluster is still an issue. Although many cluster validity indices have been proposed, there is a need for new approaches that can measure the clustering quality more accurately because most of the existing approaches measure the cluster quality correctly when the shape of the cluster is spherical. However, very few clusters in the real world are spherical. Therefore, a new Validity Index for Arbitrary-Shaped Clusters based on the kernel density estimation (the VIASCKDE Index) to overcome the mentioned issue was proposed in the study. In the VIASCKDE Index, we used separation and compactness of each data to support arbitrary-shaped clusters and utilized the kernel density estimation (KDE) to give more weight to the denser areas in the clusters to support cluster compactness. To evaluate the performance of our approach, we compared it to the state-of-the-art cluster validity indices. Experimental results have demonstrated that the VIASCKDE Index outperforms the compared indices.