20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Going mobile

475

and full of copying (from the tensor to a float[]

array to a int[] array containing ARGB values to

the bitmap), but it is as it is. It is designed to be the

inverse of bitmapToFloat32Tensor.

And that’s all we need to do to get PyTorch

into Android. Using the minimal additions to the

code we left out here to request a picture, we have

a Zebraify Android app that looks like in what

we see in figure 15.5. Well done! 16

We should note that we end up with a full version

of PyTorch with all ops on Android. This will,

in general, also include operations you will not

need for a given task, leading to the question of

whether we could save some space by leaving

them out. It turns out that starting with PyTorch

1.4, you can build a customized version of

the PyTorch library that includes only the operations

you need (see https://pytorch.org/mobile/

android/#custom-build).

15.5.1 Improving efficiency: Model design and

quantization

If we want to explore mobile in more detail, our

next step is to try to make our models faster. Figure 15.5 Our CycleGAN zebra app

When we wish to reduce the memory and compute footprint of our models, the first

thing to look at is streamlining the model itself: that is, computing the same or very

similar mappings from inputs to outputs with fewer parameters and operations. This

is often called distillation. The details of distillation vary—sometimes we try to shrink

each weight by eliminating small or irrelevant weights; 17 in other examples, we combine

several layers of a net into one (DistilBERT) or even train a fully different, simpler

model to reproduce the larger model’s outputs (OpenNMT’s original

CTranslate). We mention this because these modifications are likely to be the first step

in getting models to run faster.

Another approach to is to reduce the footprint of each parameter and operation:

instead of expending the usual 32-bit per parameter in the form of a float, we convert

our model to work with integers (a typical choice is 8-bit). This is quantization. 18

16 At the time of writing, PyTorch Mobile is still relatively young, and you may hit rough edges. On Pytorch 1.3,

the colors were off on an actual 32-bit ARM phone while working in the emulator. The reason is likely a bug

in one of the computational backend functions that are only used on ARM. With PyTorch 1.4 and a newer

phone (64-bit ARM), it seemed to work better.

17 Examples include the Lottery Ticket Hypothesis and WaveRNN.

18 In contrast to quantization, (partially) moving to 16-bit floating-point for training is usually called reduced or

(if some bits stay 32-bit) mixed-precision training.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!