A comparison of the classification of disparate malware collected in different time periods

Rafiqul Islam, Ronghua Tian, Veelasha Moonsamy, Lynn Batten

It has been argued that an anti-virus strategy based on malware collected at a certain date, will not work at a later date because malware evolves rapidly and an anti-virus engine is then faced with a completely new type of executable not as amenable to detection as the first was.
In this paper, we test this idea by collecting two sets of ma/lare, the first f rom 2002 to 2007, the second from 2009 to 2010 to determine how well the anti-virus strategy we developed based on the earlier set [14] will do on the later set. This anti-virus strategy integrates dynamic and static features extracted from the executables to classify malware by distinguishing between families.
The resulting classification accuracies are ve1y close for both datasets, with a difference of only 5.4%, the older malware being more accurately classified than the newer malware. This leads us to conjecture that current anti-virus strategies can indeed be modified to deal effectively with new malware.
