27.06.2013 Views

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Jaime Acosta<br />

Figure 3: Average percentage of the large set that is accounted for by common substrings of the<br />

small set<br />

7. Conclusions and future work<br />

This paper has provided a technique that can be used for similarity analysis on malware, based on<br />

dynamic behavior that was captured using CWSandbox. The results show that the similarities are not<br />

restricted to small sequences; many large sequences are shared among the malware instances,<br />

which mean that there are in fact many shared behaviors present that could be identified and possibly<br />

labeled using natural language to reduce an analyst’s workload, matching the intentions of Kirillov et<br />

al. (2010).<br />

Future work will test the methods described in this paper with a larger dataset. In addition, instead of<br />

limiting the process to sequential instructions, it may be useful to instead identify templates of<br />

behavior, as Christodorescu et al. (2005) did for static malware analysis. For example, there may be a<br />

trace that contains a sequence of five wait events and another with ten. Semantically, these are<br />

almost equivalent, but the common substring algorithm presented here does not capture this; a<br />

template method could. Tailoring to malware some techniques used in identifying code clones, such<br />

as in (Roy and Cody, 2007) may also prove useful.<br />

The work described here is an initial step for a tool that can be used to semantically label portions of<br />

files to allow for more efficient identification of both redundancy (use of legitimate 3 rd party libraries)<br />

and overlap (reuse of malware code) in malware instances.<br />

Acknowledgments<br />

I would like to thank Victor Mena, Ken Fabela, and Michael Shaughnessy for their valuable comments<br />

and suggestions that led to the maturation of this work. Also, I would like to thank Konrad Rieck and<br />

colleagues for the dataset and feedback.<br />

References<br />

Baecher, P., Koetter, M., Holz, T., Dornseif, M. and Freiling, F. (2006) “The Nepenthes platform: An efficient<br />

approach to collect malware”, Recent Advances in Intrusion Detection, No. 4219, pp 165–184.<br />

Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C. and Kirda, E. (2009) “Scalable, behavior-based malware<br />

clustering”, Network and Distributed System Security Symposium (NDSS).<br />

Bayer, U., Moser, A., Krügel, C. and Kirda, E. (2006) “Dynamic analysis of malicious code”, Journal in Computer<br />

Virology, Vol. 2, No. 1, pp 67–77.<br />

Christodorescu, M., Jha, S., Seshia, S. A., Song, D. and Bryant, R.E. (2005) “Semantics-Aware Malware<br />

Detection”, IEEE Symposium on Security and Privacy, pp 32–46.<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!