Machine Learning 3. Nearest Neighbor and Kernel Methods - ISMLL

More documents

Recommendations

Info

Machine Learning / 3. Parzen Windows Example / k-Nearest-Neighbor y −1.0 −0.5 0.0 0.5 1.0 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x Figure 9: Points generated by the model y = sin(4x) + N (0, 1/3) with x ∼ unif(0, 1). 30-nearest-neighbor regressor. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 34/48 Machine Learning / 3. Parzen Windows k-Nearest Neighbor is locally constant k-nearest neighbor models are • based on discrete decisions if a point is a k-nearest neighbor or not, • in effect, locally constant, • and thus not continous. Discrete decisions can be captured by binary window functions, i.e., { 1, if (x, y) ∈ Nk (x K x0 (x, x 0 ) := 0 ) 0, otherwise ∑ (x,y)∈X ŷ(x 0 ) = K(x, x 0)y ∑ (x,y)∈X K(x, x 0) instead of ŷ(x 0 ) = ∑ (x,y)∈N k (x 0 ) y k Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 35/48
Machine Learning / 3. Parzen Windows k-Nearest Neighbor is locally constant In k-nearest neighbor the size of the window varies from point to point: it depends on the density of the data: in dense parts the effective window size is small, in sparse parts the effective window size is large. Alternatively, it is also possible to set the size of the windows to a constant λ, e.g., { 1, if |x − x0 | ≤ λ K λ (x, x 0 ) := 0, otherwise Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 36/48 Machine Learning / 3. Parzen Windows Kernel Regression Instead of discrete windows, one typically uses continuous windows, i.e., continuous weights K(x, x 0 ) that reflect the distance of a training point x to a prediction point x 0 , called kernel or Parzen window, e.g., { 1 − |x−x 0 | K(x, x 0 ) := λ , if |x − x 0 | ≤ λ 0, otherwise Instead of a binary neighbor/not-neighbor decision, a continuous kernel captures a “degree of neighborship”. Kernels can be used for prediction via kernel regression, esp. Nadaraya-Watson kernel-weighted average: ∑ (x,y)∈X ŷ(x 0 ) := K(x, x 0)y ∑ (x,y)∈X K(x, x 0) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 37/48
Page 1 and 2: Machine Learning Machine Learning 3
Page 3 and 4: Machine Learning / 1. Distance Meas
Page 11 and 12: Machine Learning / 2. k-Nearest Nei
Page 21: Machine Learning 1. Distance Measur
Page 25 and 26: Machine Learning / 3. Parzen Window
Page 27 and 28: Machine Learning / 3. Parzen Window
Page 29: Machine Learning / 3. Parzen Window

Machine Learning 3. Nearest Neighbor and Kernel Methods - ISMLL

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?