348957348957
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Consider the following example: A dataset is given by [1, 1, 4, 3, 5, 2, 6, 2, 4], and point P is<br />
equal to 5. Figure 7-4 shows how kNN would be applied to this dataset.<br />
Now, if you were to specify that k is equal to 3, there are, based on distance, three nearest<br />
neighbors to the point 5. Those neighbors are 4, 4, and 6. So, based on the kNN algorithm, query<br />
point P will be classified as 4 because 4 is the majority number in the k number of points nearest<br />
to it. Similarly, kNN continues defining other query points using the same majority principle.<br />
When using kNN, it’s crucial to choose a k value that minimizes noise — unexplainable<br />
random variation, in other words. At the same time, you must choose a k value that includes<br />
sufficient data points in the selection process. If the data points aren’t uniformly distributed,<br />
it’s generally harder to predetermine a good k value. Be careful to select an optimum k value<br />
for each dataset you’re analyzing.<br />
Large k values tend to produce less noise and more boundary smoothing — clearer<br />
definition and less overlap — between classes than small k values do.<br />
Knowing when to use the k-nearest neighbor algorithm<br />
kNN is particularly useful for multi-label learning — supervised learning where the algorithm is<br />
applied so that it automatically learns from (detects patterns in) multiple sets of instances. Each of<br />
these sets could potentially have several classes of their own. With multi-label learning, the<br />
algorithm learns to predict multiple class labels for each new instance it encounters.<br />
The problem with kNN is that it takes a lot longer than other classification methods to classify a<br />
sample. Nearest neighbor classifier performance depends on calculating the distance function and<br />
on the value of the neighborhood parameter k. You can try to speed things up by specifying optimal<br />
values for k and n.<br />
Exploring common applications of k-nearest neighbor algorithms<br />
kNN is often used for internet database management purposes. In this capacity, kNN is useful for<br />
website categorization, web page ranking, and other user dynamics across the web.<br />
kNN classification techniques are also quite beneficial in customer relationship management<br />
(CRM), a set of processes that ensure a business sustains improved relationships with its clients<br />
while simultaneously experiencing increased business revenues. Most CRMs get tremendous<br />
benefit from using kNN to data-mine customer information to find patterns that are useful in<br />
boosting customer retention.<br />
The method is so versatile that even if you’re a small-business owner or a marketing department<br />
manager, you can easily use kNN to boost your own marketing return on investment. Simply use<br />
kNN to analyze your customer data for purchasing patterns, and then use those findings to<br />
customize marketing initiatives so that they’re more exactly targeted for your customer base.