08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

⎛<br />

customers<br />

⎜<br />

⎝<br />

factors<br />

⎞ ⎛ ⎞<br />

movies<br />

⎛<br />

A =<br />

U ⎝<br />

V<br />

⎟ ⎜ ⎟<br />

⎠ ⎝ ⎠<br />

⎞<br />

⎠<br />

Figure 3.3: Customer-movie data<br />

<strong>of</strong> an n×k matrix U describing the customers and a k ×d matrix V describing the movies.<br />

Finding the best rank k approximation A k by SVD gives such a U and V . One twist is<br />

that A may not be exactly equal to UV , in which case A − UV is treated as noise. Another<br />

issue is that SVD gives a factorization with negative entries. Non-negative matrix<br />

Factorization (NMF) is more appropriate in some contexts where we want to keep entries<br />

non-negative. NMF is discussed in Chapter 8.13<br />

In the above setting, A was available fully and we wished to find U and V to identify<br />

the basic factors. However, in a case such as movie recommendations, each customer may<br />

have seen only a small fraction <strong>of</strong> the movies, so it may be more natural to assume that we<br />

are given just a few elements <strong>of</strong> A and wish to estimate A. If A was an arbitrary matrix<br />

<strong>of</strong> size n × d, this would require Ω(nd) pieces <strong>of</strong> information and cannot be done with a<br />

few entries. But again hypothesize that A was a small rank matrix with added noise. If<br />

now we also assume that the given entries are randomly drawn according to some known<br />

distribution, then there is a possibility that SVD can be used to estimate the whole <strong>of</strong> A.<br />

This area is called collaborative filtering and one <strong>of</strong> its uses is to recommend movies or to<br />

target an ad to a customer based on one or two purchases. We do not describe it here.<br />

3.9.3 Clustering a Mixture <strong>of</strong> Spherical Gaussians<br />

Clustering is the task <strong>of</strong> partitioning a set <strong>of</strong> points into k subsets or clusters where<br />

each cluster consists <strong>of</strong> “nearby” points. Different definitions <strong>of</strong> the quality <strong>of</strong> a clustering<br />

lead to different solutions. Clustering is an important area which we will study in detail in<br />

Chapter ??. Here we will see how to solve a particular clustering problem using singular<br />

value decomposition.<br />

Mathematical formulations <strong>of</strong> clustering tend to have the property that finding the<br />

highest quality solution to a given set <strong>of</strong> data is NP-hard. One way around this is to<br />

assume stochastic models <strong>of</strong> input data and devise algorithms to cluster data generated by<br />

54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!