08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Note ∇ wi J(W ) is a vector. Since w i is a vector, each component <strong>of</strong> ∇ wi J(W ) is the<br />

derivative with respect to one component <strong>of</strong> the vector w i .<br />

Over fitting is a major concern in deep learning since large networks can have hundreds<br />

<strong>of</strong> millions <strong>of</strong> weights. In image recognition, the number <strong>of</strong> training images can<br />

be significantly increased by random jittering <strong>of</strong> the images. Another technique called<br />

dropout randomly deletes a fraction <strong>of</strong> the weights at each training iteration. Regularization<br />

is used to assign a cost to the size <strong>of</strong> weights and many other ideas are being explored.<br />

Deep learning is an active research area. Some <strong>of</strong> the ideas being explored are what<br />

do individual gates or sets <strong>of</strong> gates learn. If one trains a network twice from starting with<br />

random sets <strong>of</strong> weights, do gates learn the same features? In image recognition, the early<br />

convolution layers seem to learn features <strong>of</strong> images rather than features <strong>of</strong> the specific set<br />

<strong>of</strong> images they are being trained with. Once a network is trained on say a set <strong>of</strong> images<br />

one <strong>of</strong> which is a cat one can freeze the weights and then find images that will map to the<br />

activation vector generated by the cat image. One can take an artwork image and separate<br />

the style from the content and then create an image using the content but a different style<br />

[]. This is done by taking the activation <strong>of</strong> the original image and moving it to the manifold<br />

<strong>of</strong> activation vectors <strong>of</strong> images <strong>of</strong> a given style. One can do many things <strong>of</strong> this type.<br />

For example one can change the age <strong>of</strong> a child in an image or change some other feature. []<br />

For more information about deep learning, see [?]. 27<br />

6.14 Further Current directions<br />

We now briefly discuss a few additional current directions in machine learning, focusing<br />

on semi-supervised learning, active learning, and multi-task learning.<br />

6.14.1 Semi-supervised learning<br />

Semi-supervised learning refers to the idea <strong>of</strong> trying to use a large unlabeled data set U to<br />

augment a given labeled data set L in order to produce more accurate rules than would<br />

have been achieved using just L alone. The motivation is that in many settings (e.g.,<br />

document classification, image classification, speech recognition), unlabeled data is much<br />

more plentiful than labeled data, so one would like to make use <strong>of</strong> it if possible. Of course,<br />

unlabeled data is missing the labels! Nonetheless it <strong>of</strong>ten contains information that an<br />

algorithm can take advantage <strong>of</strong>.<br />

As an example, suppose one believes the target function is a linear separator that<br />

separates most <strong>of</strong> the data by a large margin. By observing enough unlabeled data to estimate<br />

the probability mass near to any given linear separator, one could in principle then<br />

27 See also the tutorials: http://deeplearning.net/tutorial/deeplearning.pdf and<br />

http://deeplearning.stanford.edu/tutorial/.<br />

228

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!