27.09.2014 Views

Artificial Immune Systems

Artificial Immune Systems

Artificial Immune Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Artificial</strong> <strong>Immune</strong> System<br />

and Its Applications<br />

Prof. Ying TAN<br />

National Laboratory on Machine Perception<br />

Department of Intelligence Science<br />

Peking University, Beijing 100871, P.R.China<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 1


Contents<br />

• Biological <strong>Immune</strong> System<br />

• <strong>Artificial</strong> <strong>Immune</strong> System<br />

• Basic Algorithms of AIS<br />

• AIS design procedure<br />

• Case Studies<br />

– Malicious Executable Detection<br />

– Film Recommender<br />

New<br />

• <strong>Immune</strong>ocomputing – IC<br />

• Danger Theory<br />

• Future<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 2


The <strong>Immune</strong> System is…<br />

<strong>Immune</strong> system: a system that<br />

protects the body from foreign<br />

substances and pathogenic<br />

organisms by producing the<br />

immune response<br />

Immunity: state or quality of<br />

being resistant (immune), either<br />

by virtue of previous exposure<br />

(adaptive immunity) or as an<br />

inherited trait (innate immunity)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 3


Why is the <strong>Immune</strong> System?<br />

<strong>Immune</strong> system has following appealing features:<br />

• Recognition<br />

– Anomaly detection<br />

– Noise tolerance<br />

• Robustness<br />

• Feature extraction<br />

• Diversity<br />

• Reinforcement learning<br />

• Memory;<br />

• Dynamically changing coverage<br />

• Distributed<br />

• Multi-layered<br />

• Adaptive<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 4


Role of Biological <strong>Immune</strong> System<br />

• Protect our bodies from pathogen and<br />

viruses<br />

• Primary immune response<br />

– Launch a response to invading pathogens<br />

• Secondary immune response<br />

– Remember past encounters<br />

– Faster response the second time around<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 5


<strong>Immune</strong> cells<br />

• There are two primarily types of<br />

lymphocytes:<br />

– B-lymphocytes (B cells)<br />

– T-lymphocytes (T cells)<br />

• Others types include macrophages,<br />

phagocytic cells, cytokines, etc.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 6


Where is it?<br />

Primary lymphoid organs<br />

Secondary lymphoid orga<br />

Tonsils and adenoids<br />

Thymus<br />

Spleen<br />

Peyer’s patches<br />

Appendix<br />

Bone m arrow<br />

Lym ph nodes<br />

Lym phatic vessels<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 7


Multiple layers of the immune system<br />

Pathogens<br />

Skin<br />

Biochemical<br />

barriers<br />

Innate<br />

immune<br />

response<br />

Phagocyte<br />

Lymphocytes<br />

Adaptive<br />

immune<br />

response<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 8


Antigen<br />

• Substances capable of starting a<br />

specific immune response commonly<br />

are referred to as antigens<br />

• This includes some pathogens such as<br />

viruses, bacteria, fungi etc .<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 9


Biological <strong>Immune</strong> System<br />

Innate<br />

vs<br />

Acquired<br />

Cell Mediated<br />

vs<br />

Humoral<br />

T Cell (Killer)<br />

T Cell (Helper)<br />

B Cell<br />

Secretes<br />

Antibody<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 10


How does IS work: A simplistic view<br />

M H C p r o t e in A n t ig e n<br />

A P C<br />

( I )<br />

P e p t id e<br />

( II )<br />

T - c e ll<br />

( III )<br />

B - c e ll<br />

( V )<br />

( IV )<br />

A c t iv a t e d T - c e ll<br />

L y m<br />

p h o k in e s<br />

( V I )<br />

A c t iv a t e d B - c e ll<br />

( p la s m a c e ll)<br />

( V II )<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 11


Self/Non-Self Recognition<br />

• <strong>Immune</strong> system needs to be able to<br />

differentiate between self and non-self<br />

cells<br />

• Antigenic encounters may result in cell<br />

death, therefore<br />

– Some kind of positive selection<br />

– Some element of negative selection<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 12


<strong>Immune</strong> Pattern Recognition<br />

BCR or Antibody<br />

B-cell Receptors (Ab)<br />

Epitopes<br />

B-cell<br />

Antigen<br />

• The immune recognition is based on the complementarity<br />

between the binding region of the receptor and a portion of the<br />

antigen called epitope.<br />

• Antibodies present a single type of receptor, antigens might<br />

present several epitopes.<br />

– This means that each antibody can recognize a single<br />

antigen<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 13


Clonal Selection<br />

Clonal deletion<br />

(negative selection)<br />

Self-antigen<br />

Proliferation<br />

(Cloning)<br />

M<br />

M<br />

Antibody<br />

Selection<br />

Differentiation<br />

Memory cells<br />

Plasma cells<br />

Foreign antigens<br />

Self-antigen<br />

Clonal deletion<br />

(negative selection)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 14


Main Properties of Clonal<br />

Selection (Burnet, 1978)<br />

• Elimination of self antigens<br />

• Proliferation and differentiation on contact of<br />

mature lymphocytes with antigen<br />

• Restriction of one pattern to one differentiated<br />

cell and retention of that pattern by clonal<br />

descendants;<br />

• Generation of new random genetic changes,<br />

subsequently expressed as diverse antibody<br />

patterns by a form of accelerated somatic<br />

mutation<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 15


<strong>Immune</strong> Network Theory<br />

• Idiotypic network (Jerne, 1974)<br />

• B cells co-stimulate each other<br />

– Treat each other a bit like antigens<br />

• Creates an immunological memory<br />

Paratope<br />

Ag<br />

Suppression<br />

Negative response<br />

Idiotope<br />

1<br />

2<br />

3<br />

Antibody<br />

Activation<br />

Positive response<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 16


Reinforcement Learning and<br />

<strong>Immune</strong> Memory<br />

• Repeated exposure to an antigen<br />

throughout a lifetime<br />

• Primary, secondary immune responses<br />

• Remembers encounters<br />

– No need to start from scratch<br />

– Memory cells<br />

• Continuous learning<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 17


Learning (2)<br />

Primary Response<br />

Secondary Response<br />

Cross-Reactive<br />

Response<br />

Antibody Concentration<br />

Lag<br />

Response<br />

to Ag 1<br />

Lag<br />

...<br />

...<br />

Response<br />

to Ag 1<br />

Response<br />

to Ag 2<br />

Lag<br />

...<br />

...<br />

Response to<br />

Ag 1 ’ =Ag 1 + Ag 3<br />

Antigen Ag 1<br />

Antigens<br />

Ag 1 , Ag 2<br />

Antigen<br />

Ag 1 + Ag 3<br />

Time<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 18


Back<br />

<strong>Immune</strong> System: Summary<br />

• Define host (body cells) from external entities.<br />

• When an entity is recognized as foreign (or<br />

dangerous)- activate several defense<br />

mechanisms leading to its destruction (or<br />

neutralization).<br />

• Subsequent exposure to similar entity results in<br />

rapid immune response.<br />

• Overall behavior of the immune system is an<br />

emergent property of many local interactions.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 19


Back<br />

<strong>Immune</strong> metaphors<br />

Other areas<br />

Idea! Idea ‘<br />

<strong>Immune</strong> System<br />

<strong>Artificial</strong> <strong>Immune</strong><br />

<strong>Systems</strong><br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 20


Definition<br />

What is an <strong>Artificial</strong> <strong>Immune</strong><br />

System?<br />

Dasgupta’99: “<strong>Artificial</strong> immune systems (AIS) are<br />

intelligent and adaptive systems inspired by the<br />

immune system toward real-world problem solving”<br />

de Castro and Timmis: “<strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong><br />

(AIS) are adaptive systems, inspired by<br />

theoretical immunology and observed immune<br />

functions, principles and models, which are<br />

applied to problem solving”<br />

http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/<br />

•Using natural immune system as a metaphor for solving complex computational problems.<br />

•Not modelling the immune system<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 21


AI models and their<br />

corresponding natural prototypes<br />

Natural prototype<br />

Natural language<br />

Brain nervous net<br />

Biological cells<br />

Molecules of<br />

proteins<br />

Genetic code<br />

Biological level<br />

Left hemisphere<br />

of brain<br />

Cells<br />

Cells<br />

Molecular<br />

Molecular<br />

AI model<br />

Formal logic<br />

Formal linguistic<br />

Neural computing (NC)<br />

Neural networks (NN)<br />

Cellular automata (CA)<br />

<strong>Artificial</strong> immune<br />

systems (AIS)<br />

Genetic Algorithms<br />

(GA)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 22


Some History<br />

• Developed from the field of theoretical<br />

immunology in the mid 1980’s.<br />

– Suggested we ‘might look’ at the IS<br />

• 1990 – Bersini first use of immune<br />

algorithms to solve problems<br />

• Forrest et al – Computer Security mid<br />

1990’s<br />

• Hunt et al, mid 1990’s – Machine learning<br />

• More……<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 23


AIS’ Scope<br />

• Pattern recognition;<br />

• Fault and anomaly detection;<br />

• Data analysis;<br />

• Data mining (classification/clustering)<br />

• Agent-based systems;<br />

• Scheduling;<br />

• Machine-learning;<br />

• Autonomous navigation and control;<br />

• Search and optimization methods;<br />

• <strong>Artificial</strong> life;<br />

• Security of information systems;<br />

• Optimization;<br />

• Just to name a few.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 24


Back<br />

Typical Applications of AIS<br />

• Computer Security(Forrest’94’96’98, Kephart’94, Lamont’98’01,02,<br />

Dasgupta’99’01, Bentley’00’01,02)<br />

• Anomaly Detection (Dasgupta’96’01’02)<br />

• Fault Diagnosis (Ishida’92’93, Ishiguro’94)<br />

• Data Mining & Retrieval (Hunt’95’96, Timmis’99’01, ’02)<br />

• Pattern Recognition (Forrest’93, Gibert’94, de Castro ’02)<br />

• Adaptive Control (Bersini’91)<br />

• Job shop Scheduling (Hart’98, ’01, ’02)<br />

• Chemical Pattern Recognition (Dasgupta’99)<br />

• Robotics (Ishiguro’96’97,Singh’01)<br />

• Optimization (DeCastro’99,Endo’98, de Castro ’02)<br />

• Web Mining (Nasaroui’02,Secker’05)<br />

• Fault Tolerance (Tyrrell, ’01, ’02, Timmis ’02)<br />

• Autonomous <strong>Systems</strong> (Varela’92,Ishiguro’96)<br />

• Engineering Design Optimization (Hajela’96 ’98, Nunes’00)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 25


Basic <strong>Immune</strong> Models and<br />

Algorithms<br />

• Bone Marrow Models<br />

• Negative Selection Algorithms<br />

• Clonal Selection Algorithm<br />

• <strong>Immune</strong> Network Models<br />

• Somatic Hypermutation<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 26


Bone Marrow Models<br />

• Gene libraries are used to create antibodies<br />

from the bone marrow<br />

• Antibody production through a random<br />

concatenation from gene libraries<br />

• Simple or complex libraries<br />

An individual genome corresponds to four libraries:<br />

Library 1 Library 2 Library 3 Library 4<br />

A1 A2 A3 A4 A5 A6 A7 A8<br />

B1 B2 B3 B4 B5 B6 B7 B8 C1 C2 C3 C4 C5 C6 C7 C8 D1 D2 D3 D4 D5 D6 D7 D8<br />

A3<br />

B2<br />

C8<br />

D5<br />

A3<br />

B2<br />

C8<br />

D5<br />

= four 16 bit segments<br />

A3 B2 C8 D5<br />

Expressed Ab molecule<br />

= a 64 bit chain<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 27


Negative Selection (NS) Algorithms<br />

• Forrest 1994: Idea taken from the negative<br />

selection of T-cells in the thymus<br />

• Applied initially to computer security<br />

• Split into two parts:<br />

–Censoring<br />

– Monitoring<br />

Self<br />

strings (S)<br />

DetectorSet<br />

(R )<br />

Generate<br />

random strings<br />

(R 0)<br />

Match<br />

No<br />

Detector<br />

Set (R)<br />

Protected<br />

Strings (S)<br />

Match<br />

No<br />

Yes<br />

Yes<br />

Reject<br />

Non-self<br />

Detected<br />

Censoring<br />

Monitoring<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 28


Clonal Selection Algorithm (de<br />

Castro & von Zuben, 2001)<br />

1. Initialisation: Randomly initialise a population (P)<br />

2. Antigenic Presentation: for each pattern in Ag, do:<br />

2.1 Antigenic binding: determine affinity to each P<br />

2.2 Affinity maturation: select n highest affinity from P and<br />

clone and mutate prop. to affinity with Ag, then add new<br />

mutants to P<br />

3. Metadynamics:<br />

3.1 select highest affinity P to form part of M<br />

3.2 replace n number of random new ones<br />

4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 29


CLONALG for<br />

PR, Learning,<br />

Optimization<br />

Ab j<br />

*<br />

Ab {d}<br />

Ab {r}<br />

Ab {m}<br />

Ag j<br />

Select<br />

f j<br />

Select<br />

F j<br />

*<br />

Ab {n}<br />

L.N. de Castro, et.al., Learning and<br />

optimization using the clonal selection<br />

principle, IEEE Trans. Evolutionary<br />

computation, vol.6, no.3, June 2002, pp.239-<br />

251<br />

C j*<br />

Select<br />

Clone<br />

C j<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 30


Discrete <strong>Immune</strong> Network<br />

Models (Timmis & Neal, 2001)<br />

1. Initialisation: create an initial network from a sub-section of the antigens<br />

2. Antigenic presentation: for each antigenic pattern, do:<br />

2.1 Clonal selection and network interactions: for each network cell,<br />

determine its stimulation level (based on antigenic and network interaction)<br />

2.2 Metadynamics: eliminate network cells with a low stimulation<br />

2.3 Clonal Expansion: select the most stimulated network cells and<br />

reproduce them proportionally to their stimulation<br />

2.4 Somatic hypermutation: mutate each clone<br />

2.5 Network construction: select mutated clones and integrate<br />

3. Cycle: Repeat step 2 until termination condition is met<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 31


<strong>Immune</strong> Network Models<br />

• Timmis & Neal, 2000<br />

• Used immune network theory as a basis,<br />

proposed the AINE algorithm<br />

Initialize AIN<br />

For each antigen<br />

Present antigen to each ARB in the AIN<br />

Calculate ARB stimulation level<br />

Allocate B cells to ARBs, based on stimulation level<br />

Remove weakest ARBs (ones that do not hold any B cells)<br />

If termination condition met<br />

exit<br />

else<br />

Clone and mutate remaining ARBs<br />

Integrate new ARBs into AIN<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 32


<strong>Immune</strong> Network Models<br />

• De Castro & Von Zuben (2000c)<br />

• aiNET, based in similar principles<br />

At each iteration step do<br />

For each antigen do<br />

Determine affinity to all network cells<br />

Select n highest affinity network cells<br />

Clone these n selected cells<br />

Increase the affinity of the cells to antigen by reducing the<br />

distance between them (greedy search)<br />

Calculate improved affinity of these n cells<br />

Re-select a number of improved cells and place into matrix M<br />

Remove cells from M whose affinity is below a set threshold<br />

Calculate cell-cell affinity within the network<br />

Remove cells from network whose affinity is below<br />

a certain threshold<br />

Concatenate original network and M to form new network<br />

Determine whole network inter-cell affinities and remove all those<br />

below the set threshold<br />

Replace r% of worst individuals by novel randomly generated ones<br />

Test stopping criterion<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 33


Back<br />

Somatic Hypermutation<br />

• Mutation rate in proportion to affinity<br />

• Very controlled mutation in the natural immune<br />

system<br />

• Trade-off between the normalized antibody<br />

affinity D* and its mutation rate α,<br />

1<br />

0 . 9<br />

0 . 8<br />

α<br />

0 . 7<br />

0 . 6<br />

0 . 5<br />

0 . 4<br />

ρ = 5<br />

ρ = 1 0<br />

0 . 3<br />

ρ = 2 0<br />

0 . 2<br />

0 . 1<br />

0<br />

0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 34<br />

D *


General Framework of AIS<br />

Solution<br />

<strong>Immune</strong> Algorithms<br />

Affinity Measures<br />

Representation<br />

Problem<br />

Application Domain<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 35


Representation – Shape Space<br />

• Describe the general shape of a molecule<br />

Antigen<br />

Antibody<br />

•Describe interactions between molecules<br />

•Degree of binding between molecules<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 36


Representation<br />

•Vectors<br />

Ab = 〈Ab , Ab , ..., Ab 〉<br />

1 2 L<br />

Ag = 〈Ag , Ag , ..., Ag 〉<br />

1 2 L<br />

• Real-valued shape-space<br />

• Integer shape-space<br />

• Binary shape-space<br />

• Symbolic shape-space<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 37


Define their Interaction<br />

• Define the term Affinity<br />

• Affinity is related to distance<br />

– Euclidian<br />

D<br />

=<br />

L<br />

∑<br />

i=<br />

1<br />

( Ab i<br />

− Ag i<br />

• Other distance measures such as Hamming,<br />

Manhattan etc. etc.<br />

• Affinity Threshold<br />

2<br />

)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 38


Shape Space Formalism<br />

• Repertoire of the<br />

immune system is<br />

V ε<br />

ε<br />

´<br />

V<br />

complete (Perelson,<br />

1989)<br />

´<br />

V ε<br />

´<br />

ε<br />

´<br />

• Extensive regions of<br />

complementarity<br />

V ε<br />

´<br />

ε<br />

´<br />

´<br />

• Some threshold of<br />

recognition<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 39


Back<br />

AIS Design<br />

• Problem description<br />

• Deciding the immune principles used for<br />

problem solving<br />

• Engineering the AIS<br />

– Defining the types of immune components used<br />

– Defining the representation for the elements of the AIS<br />

– Applying immune principle to problem solving<br />

– The meta-dynamics of an AIS<br />

• Reverse mapping from AIS to the real problem<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 40


Back<br />

Case Studies of AIS<br />

• Malicious Executables Detection ---<br />

From Z.H. Guo, Z.K. Liu, and Y. Tan, An NNbased<br />

Malicious Executables Detection Algorithm<br />

based on <strong>Immune</strong> Principles, F.Yin, J.Wang, C.<br />

Guo (Eds.): ISNN 2004, Springer, Lecture Notes<br />

in Computer Science 3174, pp. 675-680, 2004.<br />

(http://dblp.uni-trier.de)<br />

• Film Recommender --- From Dr. Dr Uwe<br />

Aickelin (http://www.aickelin.com), University of<br />

Nottingham, U.K. 2004<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 41


New!<br />

<strong>Immune</strong>ocomputing -- IC<br />

By Tarakanov, A. 2001.<br />

Aims of<br />

• A proper mathematical framework;<br />

• A new kind of computing;<br />

• A new kind of hardware.<br />

New concepts of<br />

formal protein (FP) -------<br />

formal immune networks (FIN)-------<br />

vs. neuron<br />

vs. NN<br />

Refer to<br />

•A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,<br />

Immunocomputing: Principles and Applications, Springer, 2003.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 42


Problems of Traditional Self/Non-self View<br />

• No reaction to foreign bacteria in gut (friendly<br />

bacteria…).<br />

• No reaction to food / air / etc.<br />

• The human body changes over its life.<br />

• Auto-immune diseases.<br />

• How do we produce antibodies that react against<br />

antigens and yet avoid self?<br />

• Is it necessary to attack all non-self or a specific self?<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 43


New!<br />

The Danger Theory<br />

• In the danger model, the idea is to recognise ‘danger’<br />

rather than non self.<br />

• The screening is accomplished post production through<br />

an external ‘danger’ signal. Thus the production of<br />

autoreactive antibodies (which react to self) is allowed.<br />

• If an (e.g. autoreactive) antibody matches a stimulus in<br />

the absence of danger, it is removed. Thus harmless<br />

antigens are tolerated, and changing self accommodated.<br />

Matzinger (2002). The Danger Model: A renewed sense of self , Science 296:<br />

301-304.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 44


Danger Theory (con’t)<br />

• Danger Theory<br />

– Not self/non-self but Danger/Non-Danger<br />

– <strong>Immune</strong> response is initiated in the tissues.<br />

Danger Zone.<br />

– This makes it context dependant<br />

• Matzinger (2002) The Danger Model: A renewed sense of self<br />

Science 296: 301-304<br />

• Aickelin & Cayzer (2002) The Danger Theory and Its Application<br />

to <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong>, Proc. International Conference on AIS<br />

(ICARIS 2002)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 45


Danger Zone<br />

Danger<br />

Zone<br />

Stimulation<br />

Match, but<br />

too far<br />

No match<br />

away<br />

Antibodies<br />

Antigens<br />

Cells<br />

Damaged Cell<br />

Danger Signal<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 46


Towards a ‘dangerous’ IDS<br />

“The danger theory suggests that the<br />

immune system reacts to threats based on<br />

the correlation of various (danger) signals,<br />

providing a method of ‘grounding’ the<br />

immune response, i.e. linking it directly to<br />

the attacker.”<br />

Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger<br />

Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd<br />

International Conference on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong>, LNCS 2787, pp<br />

147-155<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 47


Other ways of using danger<br />

Danger = Crime, Antigen = Suspect<br />

or...<br />

Danger = Context ?<br />

It could also be useful for data mining, where the ‘danger’<br />

signal is a proxy measure of interest<br />

‘Danger Zone’ can be spatial or temporal<br />

Andrew Secker, Alex Freitas, and Jon Timmis (2005) “Towards a danger theory inspired<br />

artificial immune system for web mining” in A Scime, editor, Web Mining: applications and<br />

techniques, pages 145-168 (Idea Group)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 48


Back<br />

Some Recent Applications of<br />

Danger Theory<br />

• Anjum Iqbal, Mohd Aizaini Maarof, “Danger<br />

Theory and Intelligent Data Processing,”<br />

International Journal of Information Technology,<br />

Vol.1, No.1, 2004.<br />

• Andrew Secker, Alex A. Freitas, and Jon Timmis,<br />

“A Danger Thory Inspired Approach to Web<br />

Mining,” Computing Lab. University of Kent,<br />

Canterbury, Kent, UK.2005<br />

• So on.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 49


The Future<br />

• More formal approach required?<br />

• Wide possible application domains.<br />

• What makes the immune system<br />

unique?<br />

• More work with immunologists:<br />

– Danger theory.<br />

– Idiotypic Networks.<br />

– Self-Assertion.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 50


Reference for further reading<br />

Books<br />

• <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong> and Their<br />

Applications by Dipankar Dasgupta (Editor)<br />

Springer Verlag, January 1999.<br />

• L.N. de Castro and J. Timmis, <strong>Artificial</strong> <strong>Immune</strong><br />

<strong>Systems</strong>: A New Computational Intelligence<br />

Approach, Springer, 2002.<br />

• A.O. Tarakanov, V.A. skormin, and S.P. Sokolova,<br />

Immunocomputing: Principles and Applications,<br />

Springer, 2003.<br />

Related academic papers<br />

• J. Timmis, P.Bentley, and Emma Hart (Eds.): <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong>,<br />

Proceedings of Second International Conference, ICARIS 2003,<br />

Edinburgh, UK, September 2003. LNCS 2787, Springer.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 51


New Events:<br />

• Special Session on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong> at the Congress<br />

on Evolutionary Computation (CEC), December 8-12, 2003,<br />

Canberra, Australia.<br />

• Special Session on Immunity-Based <strong>Systems</strong> at Seventh<br />

International Conference on Knowledge-Based Intelligent<br />

Information & Engineering <strong>Systems</strong> (KES), September 3-5,<br />

2003, University of Oxford, UK.<br />

• Second International Conference on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong><br />

(ICARIS), September 1-3, 2003, Napier University, Edinburgh,<br />

UK.<br />

• Tutorial on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong> at 1st Multidisciplinary<br />

International Conference on Scheduling: Theory and<br />

Applications (MISTA), 12 August 2003, The University of<br />

Nottingham, UK.<br />

• Tutorial on Immunological Computation at International Joint<br />

Conference on <strong>Artificial</strong> Intelligence (IJCAI), August 10, 2003,<br />

Acapulco, Mexico.<br />

• Special Track on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong> at Genetic and<br />

Evolutionary Computation Conference (GECCO), Chicago, USA,<br />

July 12-16, 2003<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 52


AIS Resources<br />

• <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong> and Their Applications by D<br />

Dasgupta (Editor), Springer Verlag, 1999.<br />

• <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong>: A New Computational<br />

Intelligence Approach by L de Castro, J Timmis, Springer<br />

Verlag, 2002.<br />

• Immunocomputing: Principles and Applications by A<br />

Tarakanov et al, Springer Verlag, 2003.<br />

• Third International Conference on <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong><br />

(ICARIS), September 13-16, 2004, University of Catania, Italy.<br />

• 4th International Conference on <strong>Artificial</strong> <strong>Immune</strong><br />

<strong>Systems</strong>(ICARIS), 14th-17th August, 2005 in Banff,<br />

Alberta, Canada<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 53


First Page<br />

That’s all<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 54


Case Study 1:<br />

Malicious Executables Detection<br />

based on <strong>Artificial</strong> <strong>Immune</strong> Principles*<br />

From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious<br />

Executables Detection Algorithm based on <strong>Immune</strong> Principles, F.Yin,<br />

J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on<br />

Computer Science 3174, pp. 675-680, 2004. (http://dblp.unitrier.de)<br />

* This work was supported by Natural Science Foundation<br />

of China with Grant No. 60273100.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 55


Outline<br />

• Definition of Terms<br />

• Goal and Motivation<br />

• Previous Research works<br />

• <strong>Immune</strong> Principle for Malicious Executable<br />

Detection<br />

• Malicious Executable Detection Algorithm<br />

• Experiments and Discussion<br />

• Concluding Remarks<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 56


Back<br />

Definition of Terms<br />

• Malicious Executable<br />

is generally defined as a program that has some<br />

malicious functions, such as compromising a<br />

system’s security, damaging a system or<br />

obtaining sensitive information without the<br />

permission of users. It includes virus, trojan<br />

horse, worm etc.<br />

• Benign Executable<br />

is a normal program without any malicious<br />

function.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 57


tens of thousands of<br />

new viruses / year<br />

Appear!<br />

But: Current antivirus systems<br />

attempt to detect these new<br />

malicious programs with<br />

heuristics by hand (costly<br />

and ineffective)<br />

Dos/Win32 viruses<br />

Trojan horses<br />

Computers / Information <strong>Systems</strong><br />

Worms<br />

eMail attached viruses<br />

Malicious executables<br />

Current Task:<br />

Devise new methods<br />

for detecting new ME<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 58


Back<br />

Definition of Symbols and<br />

Structures<br />

B: binary code alphabet, B={0,1}.<br />

Seq(s,k,l): short sequence cutting operation.<br />

Supposing s is binary sequence, and s=b(0)b(1)…b(n-1), b(i)∈B,<br />

then Seq(s,k,l)=b(k)b(k+1)…b(k+l-1).<br />

E(k): executable set, k∈{m,b},<br />

m denotes malicious executable, b benign executable.<br />

E: whole set of executables, i.e., E= E(m)∪E(b).<br />

e(f j<br />

,n): executable as binary sequence of length n,<br />

and f j<br />

is executable identifier.<br />

l d<br />

: detector code length.<br />

l step<br />

: step size of detector generation.<br />

d l<br />

: detector, dl = Seq(s,k,l).<br />

D l<br />

: set of detector with code length l,<br />

i.e., D l<br />

={ d l<br />

(0), d l<br />

(1),…, d l<br />

(n d<br />

-1)}, |D l<br />

|= n d<br />

.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 59


Back<br />

Goal and Motivation<br />

• Aiming at developing an automatic<br />

detection approach of new malicious<br />

executables.<br />

• Aiming at trying to use artificial immune<br />

system (AIS) and artificial neural networks<br />

(ANN), to detect malicious executable with<br />

a high Detection Rate (DR) with low False<br />

Positive Rate (FPR) over others.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 60


Back<br />

Previous Related Works<br />

• Signature-based Methods<br />

• Expert Knowledge-based Methods<br />

• Machine Learning Methods<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 61


Back<br />

Signature-based Methods<br />

It creates a unique tag for each malicious program so that future<br />

examples of it can be correctly classified with a small error rate.<br />

And relies on signatures of known malicious executable to generate<br />

detection models.<br />

Drawbacks:<br />

• Can not detect unknown and mutated viruses.<br />

• As increase of the number and type of viruses, its detection speed<br />

become slow dramatically. At the same time, the analysis of the<br />

signatures of viruses become very difficult, in particular, for the<br />

encrypted signatures.<br />

(refer to IBM Anti-virus Group’s report: R.W. Lo, K.N. Levitt, and R.A.<br />

Olsson. MCF: a Malicious Code Filter. Computers & Security,<br />

14(6):541–566., 1995.)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 62


Back<br />

Expert Knowledge-based<br />

Methods<br />

Using the knowledge of a group of virus<br />

experts to construct heuristic classifiers<br />

for detection of unknown viruses.<br />

Drawbacks:<br />

• Time-consuming analysis method.<br />

• Only discover some unknown viruses, but its false<br />

detection rate is very high.<br />

For detecting unknown virus based on ANN, IBM Anti-virus<br />

Group also proposes one method to detect Boot Sector<br />

viruses only.<br />

(refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic<br />

Virus Detection. Proceedings of the 2000 International Virus Bulletin<br />

Conference, 2000.)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 63


Back<br />

Machine Learning Methods<br />

• M.G. Schultz developed a framework that used<br />

data mining algorithms, i.e., Multi-Naïve Bayes<br />

method, to train multiple classifiers on a set of<br />

malicious and benign executables to detect new<br />

examples (unknown ME).<br />

(refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for<br />

Detection of New Malicious Executables. IEEE Symposium on Security<br />

and Privacy, May 2001.)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 64


Biologically-motivated Information<br />

Processing <strong>Systems</strong><br />

• Brain-nervous systems – Neural Networks (NN)<br />

• Genetic systems – Genetic Algorithms(GA)<br />

• <strong>Immune</strong> systems – <strong>Artificial</strong> <strong>Immune</strong> <strong>Systems</strong>(AIS)<br />

or immunological computation.<br />

NN and GA have extensively studied with wide<br />

applications but AIS has relative few applications<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 65


Natural prototypes vs. their models<br />

Natural<br />

prototype<br />

Natural language<br />

Brain nervous<br />

net<br />

Biological cells<br />

Molecules of<br />

proteins<br />

Genetic code<br />

Cells<br />

Cells<br />

Biological<br />

level<br />

Left<br />

hemisphere of<br />

brain<br />

Molecular<br />

Molecular<br />

Computing model<br />

Formal logic<br />

Formal linguistic<br />

<strong>Artificial</strong> Neural<br />

networks (ANN)<br />

Cellular automata<br />

(CA)<br />

<strong>Artificial</strong> immune<br />

systems (AIS)<br />

Genetic Algorithms<br />

(GA)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 66


Comparison of Three Algorithms<br />

GA (Optimisation)<br />

NN (Classification)<br />

AIS<br />

Components<br />

Chromosome Strings<br />

<strong>Artificial</strong> Neurons<br />

Attribute Strings<br />

Location of<br />

Components<br />

Dynamic<br />

Pre-Defined<br />

Dynamic<br />

Structure<br />

Discrete Components<br />

Networked Components<br />

Discrete components /<br />

Networked Components<br />

Knowledge Storage<br />

Chromosome Strings<br />

Connection Strengths<br />

Component<br />

Concentration / Network<br />

Connections<br />

Dynamics<br />

Evolution<br />

Learning<br />

Evolution / Learning<br />

Meta-Dynamics<br />

Recruitment / Elimination<br />

of Components<br />

Construction / Pruning of<br />

Connections<br />

Recruitment / Elimination<br />

of Components<br />

Interaction between<br />

Components<br />

Crossover<br />

Network Connections<br />

Recognition / Network<br />

Connections<br />

Interaction with<br />

Environment<br />

Fitness Function<br />

External Stimuli<br />

Recognition / Objective<br />

Function<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 67


Back<br />

<strong>Immune</strong> Principles for Malicious<br />

Executable Detection<br />

• Non-self Detection Principle<br />

• Anomaly Detection Based on Thickness<br />

• The Diversity of Detector Representation<br />

vs. Anomaly Detection Hole<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 68


Non-self Detection Principle<br />

• For natural immune system, all cells of body are<br />

categorized as two types of self and non-self. The<br />

immune process is to detect non-self from cells.<br />

• To realize the non-self detection, the maturation<br />

process of lymphocytes T cell undergoes two<br />

selection stages of Positive Selection and Negative<br />

Selection since antigenic encounters may result in<br />

cell death. Some computer scientists inspired by<br />

these two stages had proposed some algorithms<br />

used to detect anomaly information. Here, we will<br />

use the Positive Selection Algorithm (PSA) to<br />

perform the non-self detection for recognizing the<br />

malicious executable.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 69


Back<br />

Non-self Detection by PSA<br />

Detector Set D l<br />

Short sequence to<br />

be detected<br />

(Its length is l)<br />

Match ??<br />

Y<br />

N<br />

self<br />

non-self<br />

Process of anomaly detection with PSA<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 70


Back<br />

Anomaly Detection Based on<br />

Thickness<br />

• Anomaly recognition process is one<br />

process that immune cells detect<br />

antigens and are activated.<br />

• The activated threshold of immune cells<br />

is decided by the thickness of immune<br />

cells matching antigens.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 71


The Diversity of Detector Representation<br />

vs. Anomaly Detection Hole<br />

• The main difficulty of anomaly detection is utmost<br />

decreasing the anomaly detection hole. The natural<br />

immune system resolves this problem well by use of the<br />

diversity of MHC (Major Histocompatibility Complex) cell<br />

representations, which decides the diversity of anti-body<br />

touched in surface of T cells. This property is very useful<br />

in increasing the power of detecting mutated antigens,<br />

and decreasing the anomaly detection hole.<br />

• According to the principle, we can use the diversity of<br />

detector representation to decrease the anomaly<br />

detection hole. As was illustrated by following schematic<br />

drawings.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 72


Schematic diagram of abnormal<br />

detection holes (cont’)<br />

Abnormal<br />

detection holes<br />

Self Space<br />

Nonself Space<br />

Detectors<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 73


ack<br />

Reduction of abnormal detection<br />

holes by use of the diversity of<br />

detector representations<br />

Detector<br />

Representation 1<br />

Detector<br />

Representation 2<br />

Detector<br />

Representation 3<br />

Combination of detectors<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 74


Malicious Executable Detection<br />

Algorithm (MEDA)<br />

MEDA based on AIS includes three<br />

parts,<br />

• Detector generation,<br />

• Anomaly information extraction ,<br />

•andClassification.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 75


Back<br />

Flow Chart of Malicious Executable<br />

Detection Algorithm (MEDA)<br />

Gene<br />

(…01101001…)<br />

Generating detector set<br />

MEDA<br />

Extracting<br />

property<br />

anomaly<br />

Classifier<br />

Update Gene<br />

(…10101101…)<br />

Executable to be detected<br />

(…00111101…)<br />

Output<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 76


Generation of Detector Set<br />

Detector generation algorithm:<br />

• Begin initialize l step 、l d 、k=0<br />

• Do cutting e(f k ,n) from Eg(b)<br />

• i=0;<br />

• While i


Back<br />

Illustration of Detector Generating<br />

Process<br />

File Hex Sequence: 56 32 12 0A 34 ED FF 00 2D…. . 00 0A 34 ED FF FA 11 00<br />

Extracting Detector: 56 32 12<br />

32 12 0A<br />

12 0A 34<br />

┋……………………………………………┋<br />

FF FA 11<br />

FA 11 00<br />

Generating Process of 24-bit Detectors with 8-bit stepsize (l d<br />

=24, l step<br />

=8)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 78


Extraction of Anomaly Characteristics --<br />

Non-self Thickness (NST)<br />

• Non-self Detection<br />

• NST, as Anomaly Property, is defined<br />

as the ratio of number of non-self units<br />

to file binary sequence, p l =n n /(n n +n s ).<br />

• If there are m kinds of detectors, the file<br />

has a NST Vector P=(p l1, p l2, … , p lm ) T .<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 79


NST Extraction Diagram<br />

Initialization,choose l step 、l d , D l<br />

“Nonself” Detection<br />

File to be detected<br />

(…00111101)<br />

Is “Nonself” ?<br />

Y<br />

N<br />

n s add 1 n n add 1<br />

N<br />

Y<br />

Completing<br />

detection ?<br />

Compute p l =n n /(n n +n s )<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 80<br />

End


Back<br />

NST Extraction Algorithm<br />

• Begin open e(f k ,n);<br />

• Select l step , l d ;<br />

• Set n s =0, n n =0, i=0;<br />

• While i


BP Network Classifier<br />

• We use Anomaly Property Vector<br />

(APV), i.e., NST vector P, as input<br />

variable of two-layer BP network<br />

classifier. The number of nodes of<br />

input layer equals to APV’s<br />

dimension.<br />

• The Sigmoid transfer function is<br />

chosen for the hidden layer and<br />

Linear function for the output layer.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 82


Back<br />

BP Network Classifier Structure<br />

Non-Self Thickness (NST) Vector<br />

P<br />

p l1<br />

p l2<br />

p lm<br />

Out (1-ME, 0-BE)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 83


Back<br />

Experiments and Discussion<br />

• Experimental Data Set<br />

• Generation of Detector Set<br />

• Experimental Result Using Single Detector Set<br />

• Experimental Result Using Multi-Detector Set<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 84


Back<br />

Experimental Data Set<br />

Type<br />

Files<br />

Remarks<br />

BE<br />

ME<br />

Total<br />

915<br />

3566<br />

4481<br />

Win 2K OS and some<br />

application programs.<br />

DOS virus, Win32 virus, Trojan,<br />

Worm, etc. from Internet.<br />

All Justified by Antivirus<br />

cleaner Tools<br />

• BE—Benign Executable<br />

• ME—Malicious Executable<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 85


Back<br />

Generation of Detector Set<br />

• Eg(b) is Gene of generating detector, l d ∈{16,<br />

24,32,64,96}, and l step =8bits. By using<br />

the detector generating algorithm, we can get<br />

D16, D24, D32, D64, and D96, separately.<br />

Table1: Detectors generation result<br />

Code Length l d<br />

16<br />

24<br />

32<br />

64<br />

96<br />

|D ld |<br />

65536<br />

10,931,62<br />

7<br />

8,938,35<br />

2<br />

12,768,36<br />

1<br />

21,294,85<br />

7<br />

store<br />

structure<br />

Bitmap<br />

Index<br />

Bitmap<br />

Index<br />

Tree<br />

Tree<br />

Tree<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 86


Detection Result of Malicious<br />

Executables by D24<br />

NST p 24<br />

异 己 ” 浓 度 P24<br />

正 确 检 测 率 (Detection Rate)%<br />

File No.<br />

(a) NST of files, where symbol<br />

‘x’ represents benign program (Red),<br />

‘□’ malicious program (Blue)<br />

误 判 率 (False Positive Rate)%<br />

(b) ROC Curve<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 87


Detection Result of Malicious<br />

Executables by D32<br />

NST p 32<br />

异 己 ” 浓 度 P32<br />

正 确 检 测 率 (Detection Rate)%<br />

文 件 序 号<br />

(a) NST of files, where symbol<br />

‘x’ represents benign program,<br />

‘□’ malicious program<br />

误 判 率 (False Positive Rate)%<br />

(b) ROC Curve<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 88


Detection Result of Malicious<br />

Executables by D64<br />

NST p 64<br />

异 己 ” 浓 度 P64<br />

正 确 检 测 率 (Detection Rate)%<br />

文 件 序 号<br />

(a) NST of files, where symbol<br />

‘x’ represents benign program (Red),<br />

‘□’ malicious program (Blue)<br />

误 判 率 (False Positive Rate)%<br />

(b) ROC Curve<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 89


Experimental Result Using Single<br />

Detector Set<br />

100<br />

Detection Rate (%)<br />

80<br />

60<br />

40<br />

20<br />

0<br />

16bits D a ta S e t<br />

24bits D a ta S e t<br />

32bits D a ta S e t<br />

64bits D a ta S e t<br />

96bits D a ta S e t<br />

0 20 40 60 80 100<br />

False Positive Rate (%)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 90


Back<br />

When FPR is fixed, relationship<br />

curves of DR versus Code<br />

Length l d<br />

Detection Rate (%)<br />

Code length l d<br />

( bits)<br />

Note: from the bottom to up, the FPR is 0%, 0.5%, 1%, 2%,<br />

4%, 8%, and 16%, in sequence.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 91


Experimental Result Using Multi-<br />

Detector Set<br />

• This experiment selects multi-detector set to detect<br />

benign and malicious executables.<br />

• We don’t use D16 because of its zero DR and also set<br />

D96 as upper limit because almost same DR values<br />

when ld ≥96.<br />

• Here we selects D24, D32, D64 and D96 four detector<br />

sets as anomaly detection data set, and uses them to<br />

extract Non-self thickness (NST) vector, and finally a<br />

BP network is exploited as classifier.<br />

• For the process of classification, we randomly selects<br />

30% files of E(b) as E g (b) to train a BP network, and<br />

use the remaining data to illustrate the anomaly<br />

detection performance.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 92


NST Distribution and ROC Curve of<br />

Multi-Detector Set Method<br />

“ 异 己 ” 浓 度 (64bits)<br />

Detection Rate (%)<br />

“ 异 己 ” 浓 度 (24bits)<br />

“ 异 己 ” 浓 度 ( 32bits)<br />

False Positive Rate (%)<br />

(a) NST of files for mixture of D24,<br />

D32 and D64.<br />

‘x’ benign program (in Red),<br />

‘□’ malicious program (in Blue).<br />

(b) ROC Curve of mixed detector<br />

set of D24, D32, D64 and D96<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 93


Comparisons With Bayes Methods<br />

and Signature-based Method<br />

100<br />

80<br />

Detection Rate (%)<br />

60<br />

40<br />

20<br />

M E D A with B P N e two rk<br />

N a ive B a ye s with S trin g s<br />

M u lti-N a ive B a ye s with B yte s<br />

Sig nature M ethod<br />

0<br />

0 2 4 6 8 10 12<br />

False Positive Rate (%)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 94


Back<br />

Algorithm Complexities<br />

Operation type 1<br />

Operation type 2<br />

Operation type 3<br />

Store<br />

Space<br />

Algorithm<br />

Name<br />

Amount<br />

Name<br />

Amount<br />

Name<br />

Amount<br />

MEDA<br />

detectors<br />

l train<br />

detector<br />

matching<br />

≤80×l tes<br />

t<br />

Computing<br />

NST<br />

4×l f<br />

additions<br />

0.4Gb<br />

Bayes<br />

Prob.<br />

Info.<br />

>>l train<br />

Searching<br />

P(F i /C)<br />

Depend<br />

on P(F i /C)<br />

Computing<br />

Joint Probs.<br />

n<br />

∏<br />

PC ( ) PF ( / C)<br />

i=<br />

1<br />

i<br />

l f float<br />

multiplications<br />

1Gb<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 95


Remarks<br />

Back<br />

• For short binary sequence and single detector<br />

set for the detection of malicious executables,<br />

the performance of D 24 is the best, giving out<br />

DR 80.6% with FPR 3%.<br />

• For long code length of detector and multidetector<br />

set, our method obtains the best<br />

performance of DR 97.46% with FPR 2%, over<br />

current methods.<br />

• This result verifies<br />

– diversity of detector representation can decrease<br />

anomaly detection holes.<br />

– “non-self” thickness detection.<br />

Back<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 96


Case Study 2:<br />

• Prediction:<br />

Film Recommender<br />

From Dr. Dr Uwe Aickelin (http://www.aickelin.com)<br />

University of Nottingham, U.K.,<br />

– What rating would I give a specific film?<br />

• Recommendation:<br />

– Give me a ‘top 10’ list of films I might like.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 97


Film Recommender (con’t 1)<br />

• EachMovie database (70k users).<br />

• User Profile: set of tuples {movie, rating}.<br />

• Me: My user profile.<br />

• Neighbour: User profile of others.<br />

• Similarity metric: Correlation score.<br />

• Neighbourhood: Group of similar users.<br />

• Recommendations: From neighbourhood.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 98


Film Recommender (con’t 2)<br />

Antigen<br />

Antibody<br />

Stimulation<br />

• User Profile: set of tuples {movie, rating}<br />

• Me: My user profile.<br />

• Neighbour: User profile of others.<br />

• Affinity metric: Correlation score.<br />

Antibody – Antigen Binding Antibody – Antibody Binding<br />

Suppression<br />

• Neighbourhood: Group of similar users.<br />

Group of antibodies similar to antigen and dissimilar to other<br />

antibodies<br />

• Recommendations: From neighbourhood<br />

Weighted Score based on<br />

Similarities.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 99


Film Recommender (con’t 3)<br />

• Start with empty AIS.<br />

• Encode target user as an antigen Ag.<br />

• WHILE (AIS not full) && (More Users):<br />

– Add next user as antibody Ab.<br />

– IF (AIS at full size) Iterate AIS.<br />

• Generate recommendations from AIS.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 100


Film Recommender (con’t 4)<br />

Suppose we have 5 users and 4 movies:<br />

– u1={(m1,v11),(m2,v12),(m3,v13)}.<br />

– u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}.<br />

– u3={(m1,v31),(m2,v32),(m4,v34)}.<br />

– u4={(m1,v41),(m4,v44)}.<br />

– u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.<br />

• We do not have users’ votes for every film.<br />

• We want to predict the vote of user u4 on movie<br />

m3.<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 101


Algorithm walkthrough (1)<br />

Start with empty AIS:<br />

AIS<br />

DATABASE<br />

u 1 , u 2 , u 3 , u 4 , u 5<br />

User for whom to predict becomes<br />

antigen:<br />

AIS<br />

DATABASE u 4<br />

Ag<br />

u 1 , u 2 , u 3 , u 5<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 102


Algorithm walkthrough (2)<br />

Add antibodies until AIS is full…<br />

AIS<br />

u<br />

DATABASE<br />

1<br />

Ag<br />

u 2 , u 3 , u 5 Ab 1<br />

AIS<br />

u<br />

DATABASE<br />

2 ,u 3 Ag<br />

u<br />

Ab 1 Ab 2<br />

4<br />

Ab 3<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 103


Algorithm walkthrough (3)<br />

• Table of Correlation between Ab<br />

and Ag:<br />

– MS14, MS24, MS34.<br />

Ab 3<br />

Ab 1<br />

Ag<br />

Ab 2<br />

• Table of Correlation between<br />

Antibodies:<br />

– MS12 = CorrelCoef(Ab1, Ab2)<br />

– MS13 = CorrelCoef(Ab1, Ab3)<br />

– MS23 = CorrelCoef(Ab2, Ab3)<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 104


Algorithm walkthrough (4)<br />

• Calculate Concentration of each Ab:<br />

– Interaction with Ag (Stimulation).<br />

– Interaction with other Ab (Suppression).<br />

AIS<br />

Ag<br />

Ab 1<br />

Ab 2<br />

Ab 3<br />

Ag<br />

Ab<br />

Ab 2<br />

1<br />

Ab 2<br />

Ab 1<br />

Ab 2<br />

Ab 2<br />

Ab 2<br />

AIS<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 105


Algorithm walkthrough (5)<br />

• Generate Recommendation based on<br />

Antibody Concentration.<br />

AIS<br />

Ag<br />

Ab Ab 2<br />

1<br />

Ab 2 Ab<br />

Ab 2<br />

1<br />

Ab<br />

Ab 2<br />

2<br />

Recommendation for<br />

user u 4 on movie m 3<br />

will be highly based<br />

on vote on m 3 of user<br />

u 2<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 106


Film Recommender Results<br />

• Tested against standard method (Pearson<br />

k-nearest neighbours).<br />

• Prediction:<br />

– Results of same quality.<br />

• Recommendation:<br />

– 4 out of 5 films correct (AIS).<br />

– 3 out of 5 films correct (Pearson).<br />

Back<br />

2005-12-13 Y. Tan---<strong>Artificial</strong> <strong>Immune</strong> Sys. 107

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!