Classification of Malware Based on String and Function Feature Selection

MD Rafiqul Islam, Ronghua Tian, Lynn Batten, Steve Versteeg

Research output: Book chapter/Published conference paperConference paperpeer-review

63 Citations (Scopus)


Anti-malware software producers are continually challenged to identify and counter new malware as it is released into the wild. A dramatic increase in malware production in recent years has rendered the conventional method of manually determining a signature for each new malware sample untenable. This paper presents a scalable, automated approach for detecting and classifying malware by using pattern recognition algorithms and statistical methods at various stages of the malware analysis life cycle. Our framework combines the static features of function length and printable string information extracted from malware samples into a single test which gives classification results better than those achieved by using either feature individually. In our testing we input feature information from close to 1400 unpacked malware samples to a number of different classification algorithms. Using k-fold cross validation on the malware, which includes Trojans and viruses, along with 151 clean files, we achieve an overall classification accuracy of over 98%.
Original languageEnglish
Title of host publicationCTC 2010
Subtitle of host publication2nd Proceedings
Place of PublicationUnited States
PublisherInstitute of Electrical and Electronics Engineers
Number of pages9
Publication statusPublished - 2010
EventCybercrime and Trustworthy Computing Workshop (CTC) - Ballarat, VIC, Australia
Duration: 19 Jul 201020 Jul 2010


WorkshopCybercrime and Trustworthy Computing Workshop (CTC)


Dive into the research topics of 'Classification of Malware Based on String and Function Feature Selection'. Together they form a unique fingerprint.

Cite this