Scaling to the Adversary

yarai1978

paulo

Scaling to the Adversary

Machine Learning Driven Mining of Threat Intel from the Darkweb

Paulo Shakarian, Ph.D.

CEO, IntelliSpyre, Inc.

Fulton Entrepreneurial Professor, Arizona State University

POC2016

November 10-11, 2016 Seoul, Korea


The average cost per data breach is $4M.

(IBM, 2016)

Reactive incident response to attacks is costly.

IntelliSpyre helps companies avoid cyber attacks

through proactive machine-learning driven

darkweb threat intelligence.

Actual Deepweb

screenshot

2

Actual Darkweb

screenshot


Imminent/directed

“A hacktivist group is launching a

campaign against company X.”

Change to Threat Landscape

“A 0-day for the latest build of a certain

opiating system is available”

Change to Threat Landscape

“The price of Android exploits dropped.”

Change to Threat Landscape

“A prolific darkweb forum poster

advertised his first zero-day for sale.”


Situational

Most current

Immediate action

Info sharing, honeypot data,

Awareness

“threat intel”

(i.e. block IP address)

Mandiant, etc.

Imminent, directed

Most current

Prepare for cyber-

Service providers monitor for

threats

“darkweb intel”

attack (i.e. DDoS)

mention of specific company

Change to threat

landscape Very few –

mostly in R&D

and academia

Atmospherics

Allow for more strategic

decisions

(i.e. not using certain

software)

Involves ingesting multiple

sources close to hackers,

necessitates machine learning,

artificial intelligence, and

related techniques


One approach is to obtain information from the

darkweb using human analysts.

But the darkweb is growing quickly:

it doubled in the first half of 2016.

5


Actual darkweb screenshot

Team members with

cultural and linguistic

skills identify malicious

hacking pages

Proprietary data mining

and machine learning

techniques automatically

and regularly obtain

information

Information stored in a

unified database

schema allows queries

across multiple

darkweb sources

SaaS-based front

end and standardsbased

API


Key Technology

Automatically obtain entities from darkweb.

NO manual extraction.

Allows for automatic analysis

not performed elsewhere.

Economic Analysis

1.

2.

3.

4.

5.

6.

7.

1. Zero Day

Name

2. Description

3. Autoidentified

category

4. Category on

darkweb site

5. Date posted to

darkweb

6. Price

7. Vendor Name

Product Category Identification

Social Network Analysis

IntelliSpyre SpyrePortal platform screenshot

U.S. Provisional Patent 62/409,291

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

7


Key Technology

(Backend)

Our backend allows for significant

reduction in manpower.

Our system greatly reduces

the need for customization.

3.

1. 2. 4.

1. Small amount of custom code modules needed

2. Plugs into crawler/parser framework

3. Repeated crawling of darkweb/deepweb hacking sites

4. Data stored in normalized database schema

Number of sites (cumulative)

120

90

60

Average lines of

custom code written

per new darkweb site

438.8 57.5

30

0

Sep-19 Oct-19 Dec-19 Feb-20 Mar-20 May-20 Jun-20 Aug-20 Oct-20

Month

U.S. Provisional Patent 62/409,291

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

8


Products Catalogued

Markets often sell goods and services that do not relate to

malicious hacking, including drugs, pornography, weapons and

software services. Similar trend for forum discussions.

Only a small fraction of data (13%) are related to malicious hacking.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Filtering Challenges

Text Cleaning – removal of all alpha-numeric

characters in tandem with stop-word removal.

Misspellings and Word Variations – in bag-ofwords

approach, variations of words are

considered separately (e.g. hacker, hack,

hackers, etc.). We use character n-grams in

range(3, 5) to look for frequently grouped

characters instead of words.

Large Feature Space – feature matrix gets very

large as the number of words increase (much

larger for character n-grams). Use sparse matrix

representation.

Analyze title and description separately to

preserve context.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Filtering and Cleaning Information

Refined and tested

machine learning

models separate

noise from

important cyber

threat information.

We achieve over

90% recall on

malicious hacking

items (malware,

exploits, etc.) while

minimizing false

positives.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Product Categorization

Use a join of manual labeling and unsupervised

clustering to get the desired categorization and

specialization.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Clustering Strategy

Using a clustering strategy, we group items into

categories and continualy refine at each step.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Automated Data Tagging

Unsupervised

methods to group

hacking products

into categories

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Hacking Product

Analysis

Facebook: 119 Products.

67 Vendors. Most prolific

vendor has 8 Products.

Products spread across

15 Markets. Most wellrepresented

Market has

30 Products.

Keyloggers: widespread

prevalence of them. It is

a well-established

hacking technique.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Use Case: Vulnerability Prioritization

14,185 vulnerabilities

disclosed in 2015.

(RiskBased Security, 2015)

How to prioritize?

Current practices do not

consider

threat capabilities.

99.9% of breaches in 2015

due to known vulnerabilities.

(Verizon, 2015)

Vulnerability CVE-2015-0057

for remote code execution

No known exploit –

how do we prioritize?

IntelliSpyre finds exploit on

the darkweb: 48 BTC (~$10K)

No public or commercial

knowledge of the exploit

FireEye finds exploit in

banking malware

First time known

in public

Feb. 2015 April 2015 July 2015

60

day

Anticipate and avoid

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

16


Use Case: Vulnerability Prioritization

Exploit on the darkweb 5 days

after vulnerability release

Exploit on the darkweb 15 days

after vulnerability release

Exploit on the darkweb 19 days

after vulnerability release

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

17


Use Case: Identifying Zero Day Exploits

Title Date BTC Price

Windows 10 *HOT*

(10.0.10586 Build 10586)

- 2.00000

Windows 10 UAC

(10.0.10586)

PowerPoint 03/07/10

exploit

Internet Explorer 11

Remote Code Execution

0day

- 5.0000

Sep 14 2015 12.5954

August 23 2015 20.4676

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

18


Use Case: Hacker Economics

Select Russian Hacker Product Pricing

5500

5000

4500

4000

3500

3000

2500

2000

1500

1000

500

0

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Use Case: Social Network Analysis

We identify malware vendors

who have a presence in

multiple marketplaces

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.


Use Case: Social Network Analysis

Ransomware victims are also often

Vendor Market Network

(IntelliSpyre screenshot)

data-leakage victims.

Ransomware vendors and markets also

sell the results of data-leakage information.

IntelliSpyre can identify where data

leaks are sold from the vendors of

ransomware through link analysis.

Quick location of dataleaks after

ransomware incidents.

Hacker selling

ransomware

(black)

Hackers selling data

leaks and ransomware

in 3x different markets

Hackers selling data leaks

in markets that sell

ransomware (blue)

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

21


Use Case: Game Theoretic Risk Assessment

Threat intelligence data can feed

mathematical models of risk.

We can assess most damaging

exploits to a given system

U.S. Provisional Patent 62/261,200

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

22


And now a short demonstration…

No cameras or video.

Copyright © 2016 IntelliSpyre, Inc. All rights reserved. Not for Public Release.

23


We Are a Platform and Have Active Developers

Developers

Research Sponsors

24


IntelliSpyre made

the semi-finals

15 semi-finalists

7x from U.S.

2x cybersecurity

1x from Arizona


info@intellispyre.com

intellispyre.com

Thank You!

@PauloShakASU

@intellispyre

INNOVATION

GRAND

CHALLENGE

SEMI-FINALIST

2016

Similar magazines