6th European Conference - Academic Conferences

More documents

Recommendations

Info

Jaime Acosta The research presented here uses a dataset that consists of sandbox event traces of 3131 malware instances. Manual observation of the dataset revealed many behavior patterns that were shared across many instances such as file replacements (which involve a series of system calls), that at first glance seem complex and overwhelming, but were made simple by replacing these common behaviors with short annotations. This paper is a step in automating this process. The following are the contributions resulting from the work described in this paper. This research provides a methodology shows how the longest common substring algorithm can be modified to conduct similarity analysis on malware using dynamic event traces. This similarity may be due to code reuse, which arises from legitimate third-party libraries and also by reusing infected or malicious code. Use of this algorithm shows that in this dataset of malware, even though the instances are of different types (assigned by anti-virus programs), there are a large number of common behaviors. This means that it is the case that malware authors reuse code, and that an analyst could use this to eliminate duplicate processing. This research shows that the common behaviors identified are not limited to short trivial event sequences; there are many large sequences. This indicates that it may be possible to replace semantically rich events with natural language annotations to facilitate analysis. 2. Related work Because of the large growth of malware instances being introduced each year, there has been a large amount of work to aid in each stage of the malware analysis workflow. The first step in analysis is data collection. Tools that aid in this collection include Nepenthes (Baecher et al., 2006), Amun (Göbel, 2009), and HoneyPots (Provos, 2004). After collection, the malware instances are analyzed using static (source code) or dynamic (event traces) techniques. In the past decade there have been a wide variety of techniques used for static and dynamic analysis of legitimate source code, with the goal of exploiting program semantics in an efficient way (Cornelissen, 2009). Related to malware, there have been many techniques that exploit characteristics unique to malware, including malicious behavior, small program size, and code reuse among instances. In both static and dynamic analysis techniques, one method that has had recent attention is using machine learning to cluster similar malware instances. Clustering methods are useful because they generalize large sets of malware into categories with limited need for manual human intervention. Jang and Brumley (2009) perform static analysis by identifying areas of code reuse by clustering malware binaries. His clustering method uses bloom filters, which identify similarity of malware instances by applying hashing techniques to fixed size chunks of the malware executable code. On the other hand, Bayer et al. (2009) use machine learning algorithms to identify similarities in malware instances by comparing their dynamic event traces, which include system calls, their dependencies, and network behavior. Next, the malware instances are clustered based on their dynamic behavior. A limitation of this approach is that the algorithm is trained with a fixed set of malware. It does not allow retraining with additional malware samples during the clustering phase. Rieck extends this with his Malheur (Rieck et al., 2010) system by establishing an iterative mechanism that consists of clustering and then classifying new instances into existing clusters. In his work, similarity is determined by the presence of shared fixed-length instruction sequences. In addition, Rieck also uses a dynamic trace representation format called MIST (Trinius et al., 2010) that allows prioritization of event parameters (e.g., an openfile system call may have the file name, file type, and the file path as parameters). This is meant to allow more efficient processing for machine learning algorithms by reducing the input file size by leaving out less-critical parameters. MIST also provides a common file format to which many of the available sandbox output can be converted. After the instances are clustered, an analyst may have to conduct deeper investigation, such as exact differences and similarities in the binaries. It may be the case that malware in different clusters share common behaviors. This results in redundant analysis by a human analyst. Another issue is that instances in a cluster are not exactly the same. There may be malicious behavior that is unique to one instance within a cluster. One way to alleviate these issues is to, instead of determining similarity by using fixed size sequences as in previous work, develop techniques that are not tied to sequence length and automatically detect varied sized semantically-representative sequences. 2
Jaime Acosta Some techniques that use semantic structure for finding similarity are in code-clone detection research. These techniques have been used to identify redundancy to reduce program size or to identify plagiarism in legitimate software (Roy and Cody, 2007). The problem with using these techniques for identifying similarity and differences in malware is that the source code of malware is not available. Some attempts have been made to analyze the sequences of instructions of disassembled binaries to determine whether they are malicious. One method compared the disassembled code against behavior templates that are known to exist in malware. These templates are able to capture malicious behavior, even if the malware has small variation (Christodorescu et al., 2005). Another method (Ye et al., 2007) uses the Intelligent Malware Detection System (IMDS), to identify malware instances by checking if certain sequences of Application Programming Interface (API) calls exist in a binary Portable Exchange (PE) file. A limitation of both of these examples is that they assume the binary file is not packed and is not virtualized. In this paper the longest common substring algorithm is modified and used to identify common event sequences of varying size among a set of malware. Also, the algorithm works on the dynamic traces of malware, which are evident even if the malware is packed or virtualized. 3. Dataset 3.1 Sandbox environment The dataset used for this research was obtained from the Malheur website (http://pi1.informatik.unimannheim.de/malheur/) and was collected over a period of three years. In particular, the Reference dataset is used, which consists of the dynamic trace events of 3131 malware instances that are grouped into 24 types, as assigned by six anti-virus scanners. The dynamic traces of the malware instances were generated by CWSandbox. The event traces range in size from 700 B to 3.4 MB. The traces are encoded in the Malheur instruction set (MIST) format and are in sequential order. Furthermore, the traces are separated by thread behaviors of the executable. 3.2 MIST The dynamic trace of the malware instances in the dataset are logs of the events that occurred as the result of the execution of the malware binary. The logs contain details about each event that may be of different levels of interest to an analyst, or to analysis software. MIST encodes events in a format that will prioritize log details, e.g., filenames, sleep delay times and memory addresses associated with each event trace. In total there are 120 system calls that fall into 13 more general categories (e.g., winsock_op, file_open system calls are both in the winsock category). An extensive description and examples of MIST are presented in (Trinius et al., 2010). 4. The common substrings algorithm The algorithm developed to identify shared behaviors in malware instance event traces is a modified version of the well-known longest common substring algorithm (Cormen et al., 2001). The main difference is that in the modified version, all common substrings of a minimum length are identified, instead of only the longest. There are two main procedures that are executed to find the amount of shared behavior in the malware instances. Figure 1 is the reduction procedure that calculates the amount of common behavior in the event traces. In line 2, all common substrings are stored in the commonSubstrings variable. In order to efficiently process the files, this step was first run on instances that were labeled in the same malware class, i.e., all event traces within the ALLAPLE malware instances (as assigned by anti-virus software) were compared first, then all EJIK traces, etc. In lines 3-4, the commonSubstrings are sorted in descending order and output to a file. This allows the commonsSubstrings to be used to find commonality with other datasets. In lines 5-9, the occurrences of all strings in commonSubstrings of at least size min are identified in the largeFileSet. They are then counted and removed. Removing the occurrences in the largeFileSet allows calculating the amount of common behavior that exists in these malware instances (line 10). 3
Page 1 and 2: The Proceedings of the 6th Internat
Page 3 and 4: Contents Paper Title Author(s) Page
Page 5 and 6: Preface These Proceedings are the w
Page 7 and 8: Biographies of contributing authors
Page 9 and 10: Department of Computer Science, IQR
Page 11: Using the Longest Common Substring
Page 15 and 16: Jaime Acosta assigned by an anti-vi
Page 17 and 18: Jaime Acosta Cormen, T.H., Leiserso
Page 19 and 20: Hind Al Falasi and Liren Zhang tran
Page 21 and 22: Hind Al Falasi and Liren Zhang ther
Page 23 and 24: Hind Al Falasi and Liren Zhang the
Page 25 and 26: Edwin Leigh Armistead and Thomas Mu
Page 27 and 28: Deception Operations Security (OPS
Page 33 and 34: The Uses and Limits of Game Theory
Page 35 and 36: 3. Limits to using game theory 3.1
Page 37 and 38: Merritt Baer Effective cyberintrusi
Page 39 and 40: Merritt Baer 1.5), there seems to b
Page 41 and 42: Merritt Baer Report of the Defense
Page 43 and 44: Ivan Burke and Renier van Heerden F
Page 45 and 46: Ivan Burke and Renier van Heerden a
Page 53 and 54: Marco Carvalho et al. systems start
Page 55 and 56: Marco Carvalho et al. Benatallah, 2
Page 57 and 58: Marco Carvalho et al. management la
Page 59 and 60: Marco Carvalho et al. Figure 4: Sta
Page 61 and 62: Marco Carvalho et al. Sorrels, D.,
Page 63 and 64:
Manoj Cherukuri and Srinivas Mukkam
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Mecealus Cronkrite et al. to see bo
Page 81 and 82:
Mecealus Cronkrite et al. trivial.
Page 83 and 84:
Mecealus Cronkrite et al. socially
Page 85 and 86:
Mecealus Cronkrite et al. The views
Page 87 and 88:
Vincent Garramone and Daniel Likari
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Stephen Groat et al. Sections 4 and
Page 97 and 98:
Stephen Groat et al. probability fo
Page 99 and 100:
Stephen Groat et al. host. In this
Page 101 and 102:
Stephen Groat et al. changing addre
Page 103 and 104:
Marthie Grobler et al. leadership,
Page 105 and 106:
Marthie Grobler et al. apply to sta
Page 107 and 108:
Marthie Grobler et al. 6. Working t
Page 109 and 110:
Cyber Strategy and the Law of Armed
Page 111 and 112:
Ulf Haeussler Alliance and Allies r
Page 113 and 114:
Ulf Haeussler following the invocat
Page 115 and 116:
Ulf Haeussler NCSA (2009) NCSA Supp
Page 117 and 118:
Karim Hamza and Van Dalen of respon
Page 119 and 120:
Karim Hamza and Van Dalen From a mi
Page 121 and 122:
Karim Hamza and Van Dalen productiv
Page 123 and 124:
Intelligence-Driven Computer Networ
Page 125 and 126:
Eric Hutchins et al. of defensive a
Page 127 and 128:
Eric Hutchins et al. Defenders can
Page 129 and 130:
Eric Hutchins et al. Equally as imp
Page 131 and 132:
Eric Hutchins et al. X-Mailer: Yaho
Page 133 and 134:
Eric Hutchins et al. Received: (qma
Page 135 and 136:
Eric Hutchins et al. U.S.-China Eco
Page 137 and 138:
Saara Jantunen and Aki-Mauri Huhtin
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Brian Jewell and Justin Beaver In t
Page 147 and 148:
Brian Jewell and Justin Beaver Figu
Page 149 and 150:
4. Evaluation Brian Jewell and Just
Page 151 and 152:
Brian Jewell and Justin Beaver othe
Page 153 and 154:
Detection of YASS Using Calibration
Page 155 and 156:
Kesav Kancherla and Srinivas Mukkam
Page 157 and 158:
M M M Su−2 Sv ∑∑ u= 1 v= 1 h(
Page 159 and 160:
5.2 ROC curves Kesav Kancherla and
Page 161 and 162:
Developing a Knowledge System for I
Page 163 and 164:
Louise Leenen et al. We distinction
Page 165 and 166:
Louise Leenen et al. There is growi
Page 167 and 168:
3.1 Needs analysis Louise Leenen et
Page 169 and 170:
Louise Leenen et al. Kroenke, D.M.
Page 171 and 172:
Jose Mas y Rubi et al. As we can se
Page 173 and 174:
Figure 2: CALEA forensic model (Pel
Page 175 and 176:
Jose Mas y Rubi et al. Table 2: Com
Page 177 and 178:
Jose Mas y Rubi et al. Another pend
Page 179 and 180:
Tree of Objectives Acknowledgements
Page 181 and 182:
Secure Proactive Recovery - a Hardw
Page 183 and 184:
Ruchika Mehresh et al. implementing
Page 185 and 186:
Ruchika Mehresh et al. The coordina
Page 187 and 188:
Ruchika Mehresh et al. multiplicati
Page 189 and 190:
Ruchika Mehresh et al. Table 2: App
Page 191 and 192:
2. Network infiltration detection D
Page 193 and 194:
David Merritt and Barry Mullins on
Page 195 and 196:
David Merritt and Barry Mullins Ess
Page 197 and 198:
David Merritt and Barry Mullins Dev
Page 199 and 200:
Muhammad Naveed Pakistan Computer E
Page 201 and 202:
Muhammad Naveed response could also
Page 203 and 204:
Muhammad Naveed Table 10: Aggressiv
Page 205 and 206:
Muhammad Naveed 2006 Tcp Open Mysql
Page 207 and 208:
Muhammad Naveed 8009 Tcp Open Ajp13
Page 209 and 210:
Muhammad Naveed Austalian Taxation
Page 211 and 212:
Alexandru Nitu world and bring it i
Page 213 and 214:
Alexandru Nitu Article 51 restricts
Page 215 and 216:
Alexandru Nitu As IW strategy and t
Page 217 and 218:
Cyberwarfare and Anonymity Christop
Page 219 and 220:
Christopher Perr attacks again help
Page 221 and 222:
Christopher Perr about the current
Page 223 and 224:
Catch me if you can: Cyber Anonymit
Page 225 and 226:
David Rohret and Michael Kraft reve
Page 227 and 228:
David Rohret and Michael Kraft sary
Page 229 and 230:
Data (Evidence) Removal Shield Davi
Page 231 and 232:
Neutrality in the Context of Cyberw
Page 233 and 234:
Julie Ryan and Daniel Ryan 18th cen
Page 235 and 236:
Julie Ryan and Daniel Ryan “Decla
Page 237 and 238:
Julie Ryan and Daniel Ryan von Glah
Page 239 and 240:
Harm Schotanus et al. In the remain
Page 241 and 242:
Harm Schotanus et al. 2.3.1 Secure
Page 243 and 244:
Harm Schotanus et al. In this setup
Page 245 and 246:
Harm Schotanus et al. the label (by
Page 247 and 248:
Harm Schotanus et al. these aspects
Page 249 and 250:
Maria Semmelrock-Picej et al. they
Page 251 and 252:
Maria Semmelrock-Picej et al. User
Page 253 and 254:
Maria Semmelrock-Picej et al. SPIKE
Page 255 and 256:
Maria Semmelrock-Picej et al. A cl
Page 257 and 258:
Maria Semmelrock-Picej et al. in co
Page 259 and 260:
Maria Semmelrock-Picej et al. In th
Page 261 and 262:
Maria Semmelrock-Picej et al. Fuchs
Page 263 and 264:
Madhu Shankarapani and Srinivas Muk
Page 265 and 266:
Figure 1: UPX packed Trojan Figure
Page 267 and 268:
Trojan.Zb ot- 1342.mal Trojan.Sp y.
Page 269 and 270:
Madhu Shankarapani and Srinivas Muk
Page 271 and 272:
Namosha Veerasamy and Marthie Grobl
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
4. Conclusion Namosha Veerasamy and
Page 279 and 280:
Tanya Zlateva et al. Computer Infor
Page 281 and 282:
Tanya Zlateva et al. security and v
Page 283 and 284:
Tanya Zlateva et al. court opinions
Page 285 and 286:
Tanya Zlateva et al. 5. Pedagogy, e
Page 287 and 288:
PhD Research Papers 277
Page 289 and 290:
Shada Alsalamah et al. Level 3 all
Page 291 and 292:
Shada Alsalamah et al. assure the a
Page 293 and 294:
Shada Alsalamah et al. for Health I
Page 295 and 296:
3. Hematology Laboratory System Who
Page 297 and 298:
Shada Alsalamah et al. Pirnejad, H.
Page 299 and 300:
Michael Bilzor a diverse base of U.
Page 301 and 302:
Michael Bilzor In our current exper
Page 303 and 304:
5. Execution monitor theory Michael
Page 305 and 306:
Michael Bilzor design was run in si
Page 307 and 308:
Michael Bilzor over testbench metho
Page 309 and 310:
Evan Dembskey and Elmarie Biermann
Page 311 and 312:
Page 313 and 314:
Page 315 and 316:
Page 317 and 318:
Theoretical Offensive Cyber Militia
Page 319 and 320:
Rain Ottis Last, but not least, it
Page 321 and 322:
Rain Ottis an infantry battalion, w
Page 323 and 324:
Rain Ottis Ottis, R. (2008) “Anal
Page 325 and 326:
Work in Progress Papers 315
Page 327 and 328:
Large-Scale Analysis of Continuous
Page 329 and 330:
References William Acosta Abadi, D.
Page 331 and 332:
Natarajan Vijayarangan top box unit
Page 333 and 334:
Natarajan Vijayarangan The proposed
show all

6th European Conference - Academic Conferences

Create successful ePaper yourself

Delete template?

Save as template?