Yazar "Şenol, Ali" seçeneğine göre listele
Listeleniyor 1 - 3 / 3
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer Diagnosis(2024) Şenol, Ali; Kaya, MahmutClassification algorithms are commonly used as a decision support system for diagnosing many diseases, such as breast cancer. The accuracy of classification algorithms can be affected negatively if the data contains outliers and/or noisy data. For this reason, outlier detection methods are frequently used in this field. In this study, we propose and compare various models that use clustering algorithms to detect outliers in the data preprocessing stage of classification to investigate their effects on classification accuracy. Clustering algorithms such as DBSCAN, HDBSCAN, OPTICS, FuzzyCMeans, and MCMSTClustering (MCMST) were used separately in the data preprocessing stage of the k Nearest Neighbor (kNN) classification algorithm for outlier elimination, and then the results were compared. According to the obtained results, MCMST algorithm was more successful in outlier elimination. The classification accuracy of the kNN + MCMST model was 0.9834, which was the best one, while the accuracy of kNN algorithm without using any data preprocessing was 0.9719.Öğe Comparison of Performance of Classification Algorithms Using Standard Deviation-based Feature Selection in Cyber Attack Datasets(2023) Şenol, AliSupervised machine learning techniques are commonly used in many areas like finance, education, healthcare, engineering, etc. because of their ability to learn from past data. However, such techniques can be very slow if the dataset is high-dimensional, and also irrelevant features may reduce classification success. Therefore, feature selection or feature reduction techniques are commonly used to overcome the mentioned issues. On the other hand, information security for both people and networks is crucial, and it must be secured without wasting the time. Hence, feature selection approaches that can make the algorithms faster without reducing the classification success are needed. In this study, we compare both the classification success and run-time performance of state-of-the-art classification algorithms using standard deviation-based feature selection in the aspect of security datasets. For this purpose, we applied standard deviation-based feature selection to KDD Cup 99 and Phishing Legitimate datasets for selecting the most relevant features, and then we run the selected classification algorithms on the datasets to compare the results. According to the obtained results, while the classification success of all algorithms is satisfying Decision Tree (DT) was the best one among others. On the other hand, while Decision Tree, k Nearest Neighbors, and Naïve Bayes (BN) were sufficiently fast, Support Vector Machine (SVM) and Artificial Neural Networks (ANN or NN) were too slow.Öğe ImpKmeans: An Improved Version of the K-Means Algorithm, by Determining Optimum Initial Centroids, based on Multivariate Kernel Density Estimation and Kd-Tree(Budapest Tech Polytechnical Institution, 2024) Şenol, AliK-means is the best known clustering algorithm, because of its usage simplicity, fast speed and efficiency. However, resultant clusters are influenced by the randomly selected initial centroids. Therefore, many techniques have been implemented to solve the mentioned issue. In this paper, a new version of the k-means clustering algorithm named as ImpKmeans shortly (An Improved Version of K-Means Algorithm by Determining Optimum Initial Centroids Based on Multivariate Kernel Density Estimation and Kd-tree) that uses kernel density estimation, to find the optimum initial centroids, is proposed. Kernel density estimation is used, because it is a nonparametric distribution estimation method, that can identify density regions. To understand the efficiency of the ImpKmeans, we compared it with some state-of-the-art algorithms. According to the experimental studies, the proposed algorithm was better than the compared versions of k-means. While ImpKmeans was the most successful algorithm in 46 tests of 60, the second-best algorithm, was the best on 34 tests. Moreover, experimental results indicated that the ImpKmeans is fast, compared to the selected k-means versions. © 2024, Budapest Tech Polytechnical Institution. All rights reserved.