20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Semantic segmentation: Per-pixel classification

365

UNET Architecture

skip coNnections

upsampling

ClaSsificationication

magifying

glaSs

could be fed into linear layer

Figure 13.7 From the U-Net paper, with annotations. Source: The base of this figure is courtesy Olaf Ronneberger

et al., from the paper “U-Net: Convolutional Networks for Biomedical Image Segmentation,” which can be found at

https://arxiv.org/abs/1505.04597 and https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.

resolutions at which the network operates. In the top row is the full resolution (512 ×

512 for us), the row below has half that, and so on. The data flows from top left to bottom

center through a series of convolutions and downscaling, as we saw in the classifiers

and looked at in detail in chapter 8. Then we go up again, using upscaling

convolutions to get back to the full resolution. Unlike the original U-Net, we will be

padding things so we don’t lose pixels off the edges, so our resolution is the same on

the left and on the right.

Earlier network designs already had this U-shape, which people attempted to use

to address the limited receptive field size of fully convolutional networks. To address

this limited field size, they used a design that copied, inverted, and appended the

focusing portions of an image-classification network to create a symmetrical model

that goes from fine detail to wide receptive field and back to fine detail.

Those earlier network designs had problems converging, however, most likely due

to the loss of spatial information during downsampling. Once information reaches a

large number of very downscaled images, the exact location of object boundaries gets

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!