Hybrids of support vector machine wrapper and filter based framework for malware detection

Shamsul Huda, Jemal Abawajy, Mamoun Alazab, Mali Abdollalihian, MD Rafiqul Islam, John Yearwood

Research output: Contribution to journalArticle

37 Citations (Scopus)
3 Downloads (Pure)

Abstract

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.
Original languageEnglish
Pages (from-to)376-390
Number of pages15
JournalFuture Generation Computer Systems
Volume55
Early online dateAug 2014
DOIs
Publication statusPublished - Feb 2016

Fingerprint

Support vector machines
Application programming interfaces (API)
Application programs
Interfaces (computer)
Statistics
Engines
Malware
Viruses
Redundancy
Logistics
Feature extraction

Cite this

Huda, Shamsul ; Abawajy, Jemal ; Alazab, Mamoun ; Abdollalihian, Mali ; Islam, MD Rafiqul ; Yearwood, John. / Hybrids of support vector machine wrapper and filter based framework for malware detection. In: Future Generation Computer Systems. 2016 ; Vol. 55. pp. 376-390.
@article{738f465822814331aa38780d583b96fc,
title = "Hybrids of support vector machine wrapper and filter based framework for malware detection",
abstract = "Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.",
keywords = "API call statistics, Hybrid wrapper-filter heuristics, Malware detection",
author = "Shamsul Huda and Jemal Abawajy and Mamoun Alazab and Mali Abdollalihian and Islam, {MD Rafiqul} and John Yearwood",
note = "Includes bibliographical references.",
year = "2016",
month = "2",
doi = "10.1016/j.future.2014.06.001",
language = "English",
volume = "55",
pages = "376--390",
journal = "Future Generation Computer Systems: the international journal of grid computing: theory, methods and applications",
issn = "0167-739X",
publisher = "Elsevier",

}

Hybrids of support vector machine wrapper and filter based framework for malware detection. / Huda, Shamsul; Abawajy, Jemal; Alazab, Mamoun; Abdollalihian, Mali; Islam, MD Rafiqul; Yearwood, John.

In: Future Generation Computer Systems, Vol. 55, 02.2016, p. 376-390.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Hybrids of support vector machine wrapper and filter based framework for malware detection

AU - Huda, Shamsul

AU - Abawajy, Jemal

AU - Alazab, Mamoun

AU - Abdollalihian, Mali

AU - Islam, MD Rafiqul

AU - Yearwood, John

N1 - Includes bibliographical references.

PY - 2016/2

Y1 - 2016/2

N2 - Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.

AB - Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.

KW - API call statistics

KW - Hybrid wrapper-filter heuristics

KW - Malware detection

U2 - 10.1016/j.future.2014.06.001

DO - 10.1016/j.future.2014.06.001

M3 - Article

VL - 55

SP - 376

EP - 390

JO - Future Generation Computer Systems: the international journal of grid computing: theory, methods and applications

JF - Future Generation Computer Systems: the international journal of grid computing: theory, methods and applications

SN - 0167-739X

ER -