22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Not necessarily, no. If you’re using transfer learning, for instance, this is pretty

much not an issue because most of the model would be already trained, and a bad

initialization of the trainable part should have little to no impact on model training.

Besides, as we’ll see in a short while, using batch normalization layers makes your

model much more forgiving when it comes to a bad initialization of the weights.

"What about PyTorch’s defaults? Can’t I simply trust them?"

Trust, but verify. Each PyTorch layer has its own default initialization of the

weights in the reset_parameters() method. For instance, the nn.Linear layer is

initialized using the Kaiming (He) scheme drawn from a uniform distribution:

# nn.Linear.reset_parameters()

def reset_parameters(self) -> None:

init.kaiming_uniform_(self.weight, a=math.sqrt(5))

if self.bias is not None:

fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)

bound = 1 / math.sqrt(fan_in)

init.uniform_(self.bias, -bound, bound)

Moreover, it also initializes the biases based on the "fan-in," which is simply the

number of units in the preceding layer.

IMPORTANT: Every default initialization has its own

assumptions, and in this particular case it is assumed (in the

reset_parameters() method) that the nn.Linear layer will be

followed by a leaky ReLU (the default value for the nonlinearity

argument in the Kaiming initialization) with a negative slope

equal to the square root of five (the “a” argument in the Kaiming

initialization).

If your model does not follow these assumptions, you may run into problems. For

instance, our model used a regular ReLU instead of a leaky one, so the default

initialization scheme was off and we ended up with vanishing gradients.

"How am I supposed to know that?"

Unfortunately, there is no easy way around it. You may inspect a layer’s

reset_parameters() method and figure out its assumptions from the code (like we

568 | Extra Chapter: Vanishing and Exploding Gradients

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!