01.05.2017 Views

348957348957

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Consider the following example: A dataset is given by [1, 1, 4, 3, 5, 2, 6, 2, 4], and point P is<br />

equal to 5. Figure 7-4 shows how kNN would be applied to this dataset.<br />

Now, if you were to specify that k is equal to 3, there are, based on distance, three nearest<br />

neighbors to the point 5. Those neighbors are 4, 4, and 6. So, based on the kNN algorithm, query<br />

point P will be classified as 4 because 4 is the majority number in the k number of points nearest<br />

to it. Similarly, kNN continues defining other query points using the same majority principle.<br />

When using kNN, it’s crucial to choose a k value that minimizes noise — unexplainable<br />

random variation, in other words. At the same time, you must choose a k value that includes<br />

sufficient data points in the selection process. If the data points aren’t uniformly distributed,<br />

it’s generally harder to predetermine a good k value. Be careful to select an optimum k value<br />

for each dataset you’re analyzing.<br />

Large k values tend to produce less noise and more boundary smoothing — clearer<br />

definition and less overlap — between classes than small k values do.<br />

Knowing when to use the k-nearest neighbor algorithm<br />

kNN is particularly useful for multi-label learning — supervised learning where the algorithm is<br />

applied so that it automatically learns from (detects patterns in) multiple sets of instances. Each of<br />

these sets could potentially have several classes of their own. With multi-label learning, the<br />

algorithm learns to predict multiple class labels for each new instance it encounters.<br />

The problem with kNN is that it takes a lot longer than other classification methods to classify a<br />

sample. Nearest neighbor classifier performance depends on calculating the distance function and<br />

on the value of the neighborhood parameter k. You can try to speed things up by specifying optimal<br />

values for k and n.<br />

Exploring common applications of k-nearest neighbor algorithms<br />

kNN is often used for internet database management purposes. In this capacity, kNN is useful for<br />

website categorization, web page ranking, and other user dynamics across the web.<br />

kNN classification techniques are also quite beneficial in customer relationship management<br />

(CRM), a set of processes that ensure a business sustains improved relationships with its clients<br />

while simultaneously experiencing increased business revenues. Most CRMs get tremendous<br />

benefit from using kNN to data-mine customer information to find patterns that are useful in<br />

boosting customer retention.<br />

The method is so versatile that even if you’re a small-business owner or a marketing department<br />

manager, you can easily use kNN to boost your own marketing return on investment. Simply use<br />

kNN to analyze your customer data for purchasing patterns, and then use those findings to<br />

customize marketing initiatives so that they’re more exactly targeted for your customer base.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!