Machine Learning 3. Nearest Neighbor and Kernel Methods - ISMLL

More documents

Recommendations

Info

Machine Learning / 3. Parzen Windows Choosing the Kernelwidth If the kernelwidth λ is small larger variance – as averaged over fewer points smaller bias – as closer instances are used ⇒ risks to be too bumpy If the kernelwidth λ is large smaller variance – as averaged over more points larger bias – as instances further apart are used ⇒ risks to be too rigid / over-smoothed The kernelwidth λ is a parameter (sometimes called a hyperparameter) of the model that needs to be optimized / estimated by data. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 42/48 Machine Learning / 3. Parzen Windows Example / Epanechnikov Kernel, various kernelwidths y −1.0 −0.5 0.0 0.5 1.0 1.5 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● lambda=1 lambda=0.8 lambda=0.6 lambda=0.4 lambda=0.2 lambda=0.1 lambda=0.05 ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 43/48
Machine Learning / 3. Parzen Windows Space-averaged Estimates The probability that an instance x is within a given region R ⊆ X : ∫ p(x ∈ R) = p(x)dx R For a sample it is x 1 , x 2 , . . . , x n ∼ p (x i ∈ P ) ∼ binom(p(x ∈ R)) Let k be the number of x i that are in region R: k := |{x i | x i ∈ R, i = 1, . . . , n}| then we can estimate ˆp(x ∈ R) := k n Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 44/48 Machine Learning / 3. Parzen Windows Space-averaged Estimates If p is continous and R is very small, p(x) is almost constant in R: ∫ p(x ∈ R) = p(x)dx ≈ p(x) vol(R), for any x ∈ R R where vol(R) denotes the volume of region R. p(x) ≈ k/n vol(R) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Machine Learning, winter term 2007 45/48
Page 1 and 2: Machine Learning Machine Learning 3
Page 3 and 4: Machine Learning / 1. Distance Meas
Page 11 and 12: Machine Learning / 2. k-Nearest Nei
Page 21 and 22: Machine Learning 1. Distance Measur
Page 23 and 24: Machine Learning / 3. Parzen Window
Page 25: Machine Learning / 3. Parzen Window
Page 29: Machine Learning / 3. Parzen Window

Machine Learning 3. Nearest Neighbor and Kernel Methods - ISMLL

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?