27.06.2013 Views

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Manoj Cherukuri and Srinivas Mukkamala<br />

Figure 5: Flowchart for the process of malicious websites collection.<br />

All the malicious websites listed in these sources were collected and stored in the database. Crawling<br />

was performed on the collected malicious websites up to the third hop as described in the section 4.2<br />

to build the link structure for all the malicious websites.<br />

Finally, the domains obtained during the first two processes of the data collection were translated to<br />

their geographical location using the process described in the section 4.3. To perform a comparative<br />

analysis of the malicious websites against the legitimate websites, the top 1500 webistes were<br />

downloaded from Alexa (Top Sites, 2010), a source for top websites and were crawled upto the<br />

second hop. This domain is reffered as a set of legitimate or non-malicious websites in the remaining<br />

part of this paper.<br />

Around 350,000 distinct malicious websites were collected from the previously mentioned sources.<br />

Since these domains were detected and flagged as malicious, major portion of them were down at the<br />

time of the analysis. Only about 20,000 distinct URLs were alive at the time of our analysis. About<br />

only 5.7% of the malicious websites collected were alive for our analysis. Link analysis was performed<br />

on about 19,000 malicious websites of the 20,000 live malicious websites. Remaining websites did<br />

not have any text to perform link analysis as they were pointing out to files like executables, jars,<br />

binaries etc.<br />

Around 600,000 Uniform Resource Locators (URLs) were crawled during the collection of our dataset.<br />

The URLs were crawled at the rate 50 URLs per minute. Of the live malicious websites, 14,970<br />

domains were hosted in United States. The top five countries contributed 83% of the total malicious<br />

websites of our dataset.<br />

56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!