13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.4 ADVERSARIAL SITUATIONS 357statistical tests such as cross-validation. Finally, the bad guys can also usemachine learning. For example, if they could get hold of examples of what yourfilter blocks <strong>and</strong> what it lets through, they could use this as training data to learnhow to evade it.There are, unfortunately, many other examples of adversarial learning situationsin our world today. Closely related to junk email is search engine spam:sites that attempt to deceive Internet search engines into placing them prominentlyin lists of search results. Highly ranked pages yield direct financialbenefits to their owners because they present opportunities for advertising, providingstrong motivation for profit seekers. Then there are the computer viruswars, in which designers of viruses <strong>and</strong> virus-protection software react to oneanother’s innovations. Here the motivation tends to be general disruption <strong>and</strong>denial of service rather than monetary gain.Computer network security is a continually escalating battle. Protectorsharden networks, operating systems, <strong>and</strong> applications, <strong>and</strong> attackers findvulnerabilities in all three areas. Intrusion detection systems sniff out unusualpatterns of activity that might be caused by a hacker’s reconnaissance activity.Attackers realize this <strong>and</strong> try to obfuscate their trails, perhaps by working indirectlyor by spreading their activities over a long time—or, conversely, by strikingvery quickly. <strong>Data</strong> mining is being applied to this problem in an attempt todiscover semantic connections among attacker traces in computer network datathat intrusion detection systems miss. This is a large-scale problem: audit logsused to monitor computer network security can amount to gigabytes a day evenin medium-sized organizations.Many automated threat detection systems are based on matching current datato known attack types. The U.S. Federal Aviation Administration developed theComputer Assisted Passenger Pre-Screening System (CAPPS), which screensairline passengers on the basis of their flight records <strong>and</strong> flags individuals foradditional checked baggage screening. Although the exact details are unpublished,CAPPS is, for example, thought to assign higher threat scores to cashpayments. However, this approach can only spot known or anticipated threats.Researchers are using unsupervised approaches such as anomaly <strong>and</strong> outlierdetection in an attempt to detect suspicious activity. As well as flagging potentialthreats, anomaly detection systems can be applied to the detection of illegalactivities such as financial fraud <strong>and</strong> money laundering.<strong>Data</strong> mining is being used today to sift through huge volumes of data in thename of homel<strong>and</strong> defense. Heterogeneous information such as financial transactions,health-care records, <strong>and</strong> network traffic is being mined to create profiles,construct social network models, <strong>and</strong> detect terrorist communications.This activity raises serious privacy concerns <strong>and</strong> has resulted in the developmentof privacy-preserving data mining techniques. These algorithms tryto discern patterns in the data without accessing the original data directly,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!