Yazar "Yucelbas, Sule" seçeneğine göre listele
Listeleniyor 1 - 3 / 3
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Analyzing the effect of data preprocessing techniques using machine learning algorithms on the diagnosis of COVID-19(Wiley, 2022) Erol, Gizemnur; Uzbas, Betul; Yucelbas, Cuneyt; Yucelbas, SuleReal-time polymerase chain reaction (RT-PCR) known as the swab test is a diagnostic test that can diagnose COVID-19 disease through respiratory samples in the laboratory. Due to the rapid spread of the coronavirus around the world, the RT-PCR test has become insufficient to get fast results. For this reason, the need for diagnostic methods to fill this gap has arisen and machine learning studies have started in this area. On the other hand, studying medical data is a challenging area because the data it contains is inconsistent, incomplete, difficult to scale, and very large. Additionally, some poor clinical decisions, irrelevant parameters, and limited medical data adversely affect the accuracy of studies performed. Therefore, considering the availability of datasets containing COVID-19 blood parameters, which are less in number than other medical datasets today, it is aimed to improve these existing datasets. In this direction, to obtain more consistent results in COVID-19 machine learning studies, the effect of data preprocessing techniques on the classification of COVID-19 data was investigated in this study. In this study primarily, encoding categorical feature and feature scaling processes were applied to the dataset with 15 features that contain blood data of 279 patients, including gender and age information. Then, the missingness of the dataset was eliminated by using both K-nearest neighbor algorithm (KNN) and chain equations multiple value assignment (MICE) methods. Data balancing has been done with synthetic minority oversampling technique (SMOTE), which is a data balancing method. The effect of data preprocessing techniques on ensemble learning algorithms bagging, AdaBoost, random forest and on popular classifier algorithms KNN classifier, support vector machine, logistic regression, artificial neural network, and decision tree classifiers have been analyzed. The highest accuracies obtained with the bagging classifier were 83.42% and 83.74% with KNN and MICE imputations by applying SMOTE, respectively. On the other hand, the highest accuracy ratio reached with the same classifier without SMOTE was 83.91% for the KNN imputation. In conclusion, certain data preprocessing techniques are examined comparatively and the effect of these data preprocessing techniques on success is presented and the importance of the right combination of data preprocessing to achieve success has been demonstrated by experimental studies.Öğe Enhanced Cross-Validation Methods Leveraging Clustering Techniques(Int Information & Engineering Technology Assoc, 2023) Yucelbas, Cuneyt; Yucelbas, SuleThe efficacy of emerging and established learning algorithms warrants scrutiny. This examination is intrinsically linked to the results of classification performance. The primary determinant influencing these results is the distribution of the training and test data presented to the algorithms. Existing literature frequently employs standard and stratified (S-CV and St-CV) k-fold cross-validation methods for the creation of training and test data for classification tasks. In the S-CV method, training and test groups are formed via random data distribution, potentially undermining the reliability of performance results calculated post-classification. This study introduces innovative cross-validation strategies based on k -means and k-medoids clustering to address this challenge. These strategies are designed to tackle issues emerging from random data distribution. The proposed methods autonomously determine the number of clusters and folds. Initially, the number of clusters is established via Silhouette analysis, followed by identifying the number of folds according to the data volume within these clusters. An additional aim of this study is to minimize the standard deviation (Std) values between the folds. Particularly in classifying large datasets, the minimized Std negates the need to present each fold to the system, thereby reducing time expenditure and system congestion/fatigue. Analyses were carried out on several large-scale datasets to demonstrate the superiority of these new CV methods over the S-CV and St-CV techniques. The findings revealed superior performance results for the novel strategies. For instance, while the minimum Std value between folds was 0.022, the maximum accuracy rate achieved was approximately 100%. Owing to the proposed methods, the discrepancy between the performance outputs of each fold and the overall average is statistically minimized. The randomness in creating the training/test groups, which has been previously identified as a negative contributing factor to this discrepancy, has been significantly reduced. Hence, this study is anticipated to fill a critical and substantial gap in the existing literature concerning the formation of training/test groups in various classification problems and the statistical accuracy of performance results.Öğe Identification of full-night sleep parameters using morphological features of ECG signals: A practical alternative to EEG and EOG signals(Elsevier Sci Ltd, 2024) Yucelbas, Sule; Yucelbas, Cueneyt; Tezel, Guelay; Ozsen, Seral; Yosunkaya, SebnemElectroencephalogram (EEG) signals, which are among the most important recordings used in Polysomnography for sleep staging, are more challenging and demanding than electrocardiography (ECG) signals, both in terms of acquisition and interpretation. When examining the studies of other researchers on sleep parameters in the literature, it is evident that EEG signals are predominantly used for determining arousal (AR), K-complex (Kc), and sleep spindle (Ss) parameters. Furthermore, it is understood that electrooculography (EOG) signals are employed for detecting slow eye movements (SEM) and rapid eye movements (REM) parameters.This study is a continuation of our previous research, where we used only EEG signals for Kc and Ss detection. In this study, an approach that includes ECG signals in the determination of sleep parameters to bring practicality to sleep staging studies was adopted. For this purpose, firstly, 16 morphological features were extracted from ECG recordings taken from a total of 24 subjects after various preprocessing steps. Subsequently, these data were used to work on the detection of five different sleep parameters: AR, Kc, Ss, SEM, and REM, using the Random Subspace (RaSE) ensemble learning algorithm. The results were calculated according to various statistical criteria and a classification accuracy of over 78 % was obtained in all parameters. As a result, the sleep parameters that could be determined most successfully using the ECG signal were SEM and arousal, respectively. In addition, feature elimination was performed for these datasets using Symmetric Uncertainty (SU) ranking. As a result of the reclassification process using 9 and 12 features, the effectiveness of which was determined for both datasets, respectively, significant increases were observed in the performance outputs. Experimental results have shown that ECG signals can be used as an alternative to EEG and EOG signals in the determination of full-night sleep parameters.