10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.40.3: Counting threshold functions 48910.7510.750.5N=KN=2K0.50.25(a) N0.25010203040506070N=KK=N/210 20 30 40 50 60 70KK(b)70 060504030201050 100 150N240010.750.52300220021002000log T(N,K)log 2^N0.251900(c) 0N=KN=2K0 0.5 1 1.5 2 2.5 3N/K(d) 18001800 1900 2000 2100 2200 2300 2400Figure 40.9. The fraction of functions on N points in K dimensions that are linear threshold functions,T (N, K)/2 N , shown from various viewpoints. In (a) we see the dependence on K, whichis approximately an error function passing through 0.5 at K = N/2; the fraction reaches 1at K = N. In (b) we see the dependence on N, which is 1 up to N = K <strong>and</strong> drops sharplyat N = 2K. Panel (c) shows the dependence on N/K for K = 1000. There is a suddendrop in the fraction of realizable labellings when N = 2K. Panel (d) shows the values oflog 2 T (N, K) <strong>and</strong> log 2 2 N as a function of N for K = 1000. These figures were plottedusing the approximation of T/2 N by the error function.But perhaps we can express T (N, K) as a linear superposition of combinationfunctions of the form C α,β (N, K) ≡ ( N+αK+β). By comparing tables 40.8 <strong>and</strong>40.6 we can see how to satisfy the boundary conditions: we simply need totranslate Pascal’s triangle to the right by 1, 2, 3, . . .; superpose; add; multiplyby two, <strong>and</strong> drop the whole table by one line. Thus:∑K−1( ) N −1T (N, K) = 2. (40.8)kUsing the)fact that the Nth row of Pascal’s triangle sums to 2 N , that is,= 2 N−1 , we can simplify the cases where K −1 ≥ N −1.∑ N−1k=0( N−1k{T (N, K) =k=02 ∑ K−1k=02 N K ≥ N) (40.9)K < N.( N−1kInterpretationIt is natural to compare T (N, K) with the total number of binary functions onN points, 2 N . The ratio T (N, K)/2 N tells us the probability that an arbitrarylabelling {t n } N n=1 can be memorized by our neuron. The two functions areequal for all N ≤ K. The line N = K is thus a special line, defining themaximum number of points on which any arbitrary labelling can be realized.This number of points is referred to as the Vapnik–Chervonenkis dimension(VC dimension) of the class of functions. The VC dimension of a binarythreshold function on K dimensions is thus K.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!