12.07.2015 Views

Parallel Processing in Netflow Data Fusion - Cert

Parallel Processing in Netflow Data Fusion - Cert

Parallel Processing in Netflow Data Fusion - Cert

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Parallel</strong> <strong>Process<strong>in</strong>g</strong> <strong>in</strong> <strong>Netflow</strong><strong>Data</strong> <strong>Fusion</strong>•George Saylor•Michael Rash•G2, Inc.•Flocon, 2010


Agenda• Sett<strong>in</strong>g the stage – a description offactors• Environment• <strong>Parallel</strong> process<strong>in</strong>g to facilitate:• Fus<strong>in</strong>g large netflow dataset with other largedatasets• Jo<strong>in</strong><strong>in</strong>g netflow and other data to enrichmentdata such as IP reputation and GeoIP lists• Incident <strong>in</strong>vestigation on long time rangesus<strong>in</strong>g fused datasets


Agenda cont’d• Specific use cases• Social Networks for Computers and Networks• Long Term <strong>Data</strong> Exfiltration• Compar<strong>in</strong>g netflow, system characteristics,and time to determ<strong>in</strong>e impact• DNS Fast Flux detection and associated botnetdetection• SPAM behavior on a given network


Sett<strong>in</strong>g the stage• Target<strong>in</strong>g very large environments(ISP)• > than 2 trillion netflow records per year• Exist<strong>in</strong>g large siloed data stores of:• Large netflow database• Assets – with history• System characteristics – with history• DNS queries/responses• Large store of PCAP data


Sett<strong>in</strong>g the Stage Cont…• Goals Cont…• Unified query <strong>in</strong>terface (<strong>in</strong>stead of script<strong>in</strong>g)• Ability to run aga<strong>in</strong>st big datasets (withouthurt<strong>in</strong>g anyone else)• Iterative discovery by pivot<strong>in</strong>g on results ofprevious queries• Large scale summary and computation• Beacon detection over long time ranges• Correlat<strong>in</strong>g with other <strong>in</strong>dicators• System Level Social Network Analysis


Environment• Selected a comput<strong>in</strong>g cluster thatwas optimized for:• Very large jo<strong>in</strong>s• Large data searches• Fast <strong>in</strong>dexed access to data• Index build<strong>in</strong>g does not affect end users• Index builds on one system, distributes toqueried nodes• Ability to partition on values and ranges


Use Case: Social NetworkGrowth• Social Networks for Computers and Networks• Given an IP address or addresses, ornetwork range characterize relationshipsbetween those and other computers ornetworks• Scenario:• An IDS alert provides some <strong>in</strong>dication that agiven host is compromised• We may want to know who this mach<strong>in</strong>etalks to• Whether the mach<strong>in</strong>e’s “friends list” (othersystems) is grow<strong>in</strong>g rapidly


Use Case: <strong>Data</strong> ExfiltrationDiscovery• Social Network prioritization based on bytestransferred• Scenario:• An IDS alert provides some <strong>in</strong>dication that agiven host is compromised.• Cross reference attacker’s IP with the GeoIPand other reputation databases.• Construct the social network of othersystems this IP communicates with.• Sort by bytes transferred – the top systems<strong>in</strong> this list may be candidates for dataexfiltration nodes.


Use Case: SPAM Node and C2Discovery• SPAM nodes and mail server error codethresholds• Scenario:• IDS signatures for threshold<strong>in</strong>g largenumbers of recipient unknown error codessent back from mail servers to localsystems.• Build system level social networkconcentrat<strong>in</strong>g on command and controlcommunications and other observables.• Try to identify command and controlconnections – may become obvious withoverlapp<strong>in</strong>g external IP’s <strong>in</strong> SPAM node


Use Case: DNS Fast Flux• Stor<strong>in</strong>g large number of DNS requests andresponses• Scenario:• Use threshold<strong>in</strong>g operation for the numberof IP addresses given <strong>in</strong> response to DNSrequests for the same hostname.• Also use low TTL values <strong>in</strong> DNS responsesas a search metric.• Look for “unusual” DNS hostnames that usemultiple numbers and strange patterns (e.g.“f4n3upyhqj.com”).


Use Case: Complex <strong>Fusion</strong>• Mix<strong>in</strong>g together numerous data sources toga<strong>in</strong> better situational awareness• Scenario:• Retrieve all flows where:• For a given IP list (derived from previousexercises for example).• Platform is W<strong>in</strong>dows XP and the patch level isbelow a m<strong>in</strong>imum level or there is no netflowrecord to the W<strong>in</strong>dows Update service to<strong>in</strong>dicate that it has been patched.• Exhibits communications to disreputablehosts and/or suspicious geo locations.• Compute this via a s<strong>in</strong>gle <strong>in</strong>terface.


Questions?george.saylor@g2-<strong>in</strong>c.commichael.rash@g2-<strong>in</strong>c.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!