e-ISSN 2231-8526
ISSN 0128-7680
Surender Singh and Ashutosh Kumar Singh
Pertanika Journal of Science & Technology, Volume 26, Issue 3, July 2018
Keywords: Content and link based features, correlation based feature selection, data mining, filter and wrapper model, particle swarm optimization, spam
Published on: 31 Jul 2018
Spamming is a major issue in the area of web search. There are many features (Link & Content based) which are used for spam and non-spam classification. This paper recommends CFS+PSO, which takes the advantages of swarm behaviour (uses randomness and global communication between particles) and Correlation Based Feature Selection Technique (CFS). The objective of feature selection is to build logical model with improved performance in time and accuracy. The performance of CFS+PSO is evaluated on WEBSPAM-UK2006 with Multilayer Perceptron (MLP), Naïve Bayes, Support Vector Machine (SVM), J48 & AdaBoost. Experimental results show great decline in existing features and computational time while increases in the accuracy measures (F1 Score and AUC).
ISSN 0128-7680
e-ISSN 2231-8526