13.07.2015 Views

Immunology as a Metaphor for Computational ... - Napier University

Immunology as a Metaphor for Computational ... - Napier University

Immunology as a Metaphor for Computational ... - Napier University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 4. Applying an Immune System Analogy to Data-Clustering Problems 1114.2.2 Non-Stationary DataSection 4.2.1 describes how N antigens can be generated from s schema in equal proportions.All experiments using static data are generated from one of the three categoriesof data described; half-length schema, quarter-length schema and eighth-lengthschema. Each of these three categories contains non-overlapping schema. A more generalmethod of generating data is to also generate the schema randomly, by choosing arandom start point along the string and then setting d contiguous bits to 1. All remainingL d¥ bit positions contain wild-cards. Schem<strong>as</strong> are more likely to overlap in thisc<strong>as</strong>e, but this is likely to be a more accurate reflection of real data sets. This method£can e<strong>as</strong>ily be adapted to produce non-stationary data sets using the algorithm shown infigure 4.1. Generating dat<strong>as</strong>ets in this manner enables the proposed algorithms to beevaluated in the context of the following properties of the dat<strong>as</strong>ets:1. The number of clusters present in the dat<strong>as</strong>et2. The length of the defined section of each cluster3. The extent of overlap of the clusters4. The rate of change of the dat<strong>as</strong>et in terms of the number of clusters replaced ateach update, and the rate at which the update occurs.This will enable some conclusions to be drawn about the suitability of the suggestedapproaches <strong>for</strong> clustering real-world dat<strong>as</strong>ets, in which sensible estimates can be madeof the likely rate of change of the data, and also of the characteristics of the dat<strong>as</strong>et.4.2.2.1 Relevance of Data Generation Approach to Real Dat<strong>as</strong>etsIn a real dat<strong>as</strong>et, we would expect to observe clusters of data, in which items withineach cluster share common features. However, it is extremely likely that at le<strong>as</strong>t someof these clusters will not be distinct but will overlap with each other. There<strong>for</strong>e, whendesigning an artificial dat<strong>as</strong>et in order to test the proposed models it is essential thatthe dat<strong>as</strong>ets should exhibit at le<strong>as</strong>t two characteristics if per<strong>for</strong>mance on them is to beindicative of real-world problems:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!