DSpace Arşivi :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 8 / 8

A novel approach for spam email detection based on shifted binary patterns
(Wiley-Blackwell, 2016-01-11) Kaya, Yılmaz; Ertuğrul, Ömer Faruk
Advances in communication allow people flexibility to communicate in various ways. Electronic mail (email) is one of the most used communication methods for personal or business purposes. However, it brings one of the most tackling issues, called spam email, which also raises concerns about data safety. Thus, the requirement of detecting spams is crucial for keeping the users safe and saving them from the waste of time while tackling those issues. In this study, an effective approach based on the probability of the usage of the characters that has similar orders with respect to their UTF-8 value by employing shifted one-dimensional local binary pattern (shifted-1D-LBP) was used to extract quantitative features from emails for spam email detection. Shifted-1D-LBP, which can be described as an ordered set of binary comparisons of the center value with its neighboring values, is a content-based approach to spam detection with low-level information. To validate the performance of the proposed approach, three benchmark corpora, Spamassasian, Ling-Spam, and TREC email corpuses, were used. The average classification accuracies of the proposed approach were 92.34%, 92.57%, and 95.15%, respectively. Analysis and promising experimental results indicated that the proposed approach was a very competitive feature extraction method in spam email filtering.
A novel approach for extracting ideal exemplars by clustering for massive time-ordered datasets
(TÜBİTAK, 2017-07-30) Ertuğrul, Ömer Faruk
The number and length of massive datasets have increased day by day and this yields more complex machine learning stages due to the high computational costs. To decrease the computational cost many methods were proposed in the literature such as data condensing, feature selection, and filtering. Although clustering methods are generally employed to divide samples into groups, another way of data condensing is by determining ideal exemplars (or prototypes), which can be used instead of the whole dataset. In this study, first the efficiency of traditional data condensing by clustering approach was confirmed according to obtained accuracies and condensing ratios in 9 different synthetic or real batch datasets. This approach was then improved to be employed in time-ordered datasets. In order to validate the proposed approach, 23 different real time-ordered datasets were used in experiments. Achieved mean RMSEs were 0.27 and 0.29 by employing the condensed (mean condensed ratio was 97.17%) and the whole datasets, respectively. Obtained results showed that higher accuracy rates and condensing ratios were achieved by the proposed approach.
Doküman dili tanıma için yeni bir öznitelik çıkarım yaklaşımı: İkili desenler
(Gazi Üniversitesi, 2016-12-14) Kaya, Yılmaz; Ertuğrul, Ömer Faruk
Doğal dil işlemenin önemli alt konularından biri olan dil tanıma (DT), bir dokümanın içeriğine göre yazıldığı dili belirleme işlemidir. Bu çalışmada, karakterlerin UTF-8 değerlerini birbirleri ile karşılaştırmalar sonucu elde edilen ikili desenler kullanarak yeni bir dil tanıma yaklaşımı, bir boyutlu yerel ikili örüntüler (1B-YİÖ) önerilmiştir. Önerilen yöntem farklı sayıda dillerden oluşan metinler içeren dört veri kümesi ile test edilmiştir. 1B-YİÖ ile dokümanlardan elde edilen öznitelikler kullanılarak farklı makine öğrenmesi yöntemleri ile sınıflandırma işlemi gerçekleştirilmiştir. Dört veri kümesi için sınıflandırma başarıları sırası ile %86.20, %92.75, %100 ve %89.77 olarak gözlenmiştir. Elde edilen sonuçlara göre önerilen öznitelik çıkarım yönteminin dil tanıma için önemli örüntüler sağladığı görülmüştür.
A noninvasive time-frequency-based approach to estimate cuffless arterial blood pressure
(TÜBİTAK, 2018-09-28) Ertuğrul, Ömer Faruk; Sezgin, Necmettin
Arterial blood pressure (ABP) is one of the most vital signs in the prophylaxis and treatment of blood pressure-related diseases because raised blood pressure is the most significant cause of death and the second major cause of disability in the world. Higher ABP yields greater strain on arteries and these extra strains turn arteries into thicker, less flexible, and more narrow structures. This increases the possibility of having an artery busting or artery occlusion, which are the primary reasons for heart attacks, kidney disease, or strokes. In addition to its importance in monitoring cardiovascular homeostasis, measurement of ABP is imperative in surgical operations. In this study, a simple and effective approach was proposed to estimate ABP from electrocardiogram (ECG) and photoplethysmograph (PPG) signals by an extreme learning machine (ELM) and statistical properties of the ECG and/or PPG signals in the time-frequency domain. To evaluate and apply the proposed approach, the Cuffless Blood Pressure Estimation Dataset, which was published and shared by UCI, was employed. First, the statistical properties were extracted from ECG and PPG signals that were in the time-frequency domain. Later, extracted features were employed to estimate cuffless ABP for each subject by the ELM and some popular machine learning methods. Achieved results and reported results in the literature showed that the proposed approach can be successfully employed for estimating cuffless blood pressure (BP) from ECGs and/or PPGs. Additionally, with the proposed approach, the systolic BP, mean BP, and diastolic BP can be calculated simultaneously.
A basic and brief scheme of an application of a machine learning process
(Batman Üniversitesi, 2017) Ertuğrul, Ömer Faruk; Tağluk, Mehmet Emin; Kaya, Yılmaz
Machine learning methods are powerful tools in modeling systems or extracting knowledge about a phenomenon from samples. This paper is written in order to make the process of machine learning clearer. Therefore, the reason behind the usage of each stage of this process was given briefly. Later, Highleyman dataset was employed in tests in ML methods.
A fast feature selection approach based on extreme learning machine and coefficient of variation
(TÜBİTAK, 2017-07-30) Ertuğrul, Ömer Faruk; Tağluk, Mehmet Emin
Feature selection is the method of reducing the size of data without degrading their accuracy. In this study, we propose a novel feature selection approach, based on extreme learning machines (ELMs) and the coefficient of variation (CV). In the proposed approach, the most relevant features are identified by ranking each feature with the coefficient obtained through ELM divided by CV. The achieved accuracies and computational costs, obtained with the use of features selected via the proposed approach in 9 classification and 26 regression benchmark data sets, were compared to those obtained with all features, as well as those obtained with the features selected by a wrapper and a filtering method. The achieved accuracy values obtained with the proposed approach were generally higher than when using all features. Furthermore, high feature reduction ratios were obtained with the proposed approach, including the achieved feature reduction ratios in epilepsy, liver, EMG, shuttle, and abalone. Stock data sets were 90.48%, 90%, 70.59%, 66.67%, 75%, and 77.78%, respectively. This approach is an extremely fast process that is independent of the employed machine-learning methods.
Determining optimal artificial neural network training method in predicting the performance and emission parameters of a biodiesel-fueled diesel generator
(International Journal of Automotive Engineering and Technologies, 2019-04-03) Altun, Şehmus; Ertuğrul, Ömer Faruk
Artificial neural network (ANN) methods were employed and suggested in modeling the emissions and performance of a diesel generator fueled with waste cooking oil derived biodiesel during steady-state operation. These papers are generally built on determining optimal network structure, but the modelling accuracy of an ANN is also highly dependent on employed training method. In modeling, operating conditions and fuel blend ratio were used as the inputs while the performance and emission parameters were the outputs. The modeling results obtained by conventional ANNs that were trained by back propagation (BP) learning algorithm, radial basis function (RBF), and extreme learning machine (ELM) were compared with experimental results and each other. The accuracy of the estimations by ELM was above 95% for all the output parameters except for specific fuel consumption and thermal efficiency. Moreover, ELM performed better than BP and RBF with lower mean relative error (MRE) in case where the emissions were estimated. The ELM provided correlation coefficients of 0.987, 0.950 and 0.996 for unburned hydrocarbons (HCs), nitrogen oxides (NOx) and smoke opacity (SO), respectively, while for BP, they were 0.973, 0.818, 0.993, and for RBF, 0.975, 0.640 and 0.981. The most suitable training function for each emission and performance parameters of diesel generator was determined based on obtained accuracies.
A novel feature extraction approach in SMS spam filtering for mobile communication: one-dimensional ternary patterns
(Wiley-Blackwell, 2016-10-19) Kaya, Yılmaz; Ertuğrul, Ömer Faruk
The importance and utilization of mobile communication are increasing day by day, and the short message service (SMS) is one of them. Although SMS is a widely used communication way, it brings together a major problem, which is SMS spam messages. SMS spams do not only use vain in the mobile communication traffic but also disturb users. Based on this fact, blacklisting methods, statistical methods which are built on the frequency of occurrence of words or characters, and machine learning methods have been employed. Because punishments and legal laws are not enough to solve this problem and the Group Special Mobile number of SMS spam can easily be changed, a content-based approach must be proposed. Content-based methods showed high success in spam e-mail filtering, but it is hard in the SMS spam filtering because SMS messages are extremely short and generally contains many abbreviations. In this study, an image processing method, local ternary pattern was improved to extract features from SMS messages in the feature extraction stage. In the proposed one-dimensional ternary patterns, firstly, text message was converted to their UTF-8 values. Later, each character (its UTF-8 value) in the message was compared with its neighbors. Two different feature sets were extracted from the results of these comparisons. Finally, some machine learning methods were employed to classify these features. In order to validate the proposed approach, three different SMS corpora were used. The achieved accuracies and other employee performance measures showed that the proposed approach, one-dimensional ternary patterns, can be effectively employed in SMS spam filtering.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları