Neural Networks¶

Neural networks can be constructed using the

torch.nn

package.

Now that you had a glimpse of

autograd
,

nn

depends on

autograd

to define models and differentiate them. An

nn.Module

contains layers, and a method

forward(input)

that returns the

output
.

For example, look at this network that classifies digit images:

convnet

convnet

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule:

    weight
    =
    weight
    -
    learning_rate
    *
    gradient

Define the network¶

Let’s define this network:

                  
                  import
                  torch
                  import
                  torch.nn
                  as
                  nn
                  import
                  torch.nn.functional
                  as
                  F
                  class
                  Net
                  (
                  nn
                  .
                  Module
                  ):
                  def
                  __init__
                  (
                  self
                  ):
                  super
                  (
                  Jejala
                  ,
                  self
                  )
                  .
                  __init__
                  ()
                  # 1 input image channel, 6 output channels, 3x3 square convolution
                  # kernel
                  self
                  .
                  conv1
                  =
                  nn
                  .
                  Conv2d
                  (
                  1
                  ,
                  6
                  ,
                  3
                  )
                  self
                  .
                  conv2
                  =
                  nn
                  .
                  Conv2d
                  (
                  6
                  ,
                  16
                  ,
                  3
                  )
                  # an affine operation: y = Wx + b
                  self
                  .
                  fc1
                  =
                  nn
                  .
                  Linear
                  (
                  16
                  *
                  6
                  *
                  6
                  ,
                  120
                  )
                  # 6*6 from image dimension
                  self
                  .
                  fc2
                  =
                  nn
                  .
                  Linear
                  (
                  120
                  ,
                  84
                  )
                  self
                  .
                  fc3
                  =
                  nn
                  .
                  Linear
                  (
                  84
                  ,
                  10
                  )
                  def
                  forward
                  (
                  self
                  ,
                  x
                  ):
                  # Max pooling over a (2, 2) window
                  x
                  =
                  F
                  .
                  max_pool2d
                  (
                  F
                  .
                  relu
                  (
                  self
                  .
                  conv1
                  (
                  x
                  )),
                  (
                  2
                  ,
                  2
                  ))
                  # If the size is a square you can only specify a single number
                  x
                  =
                  F
                  .
                  max_pool2d
                  (
                  F
                  .
                  relu
                  (
                  self
                  .
                  conv2
                  (
                  x
                  )),
                  2
                  )
                  x
                  =
                  x
                  .
                  view
                  (
                  -
                  1
                  ,
                  self
                  .
                  num_flat_features
                  (
                  x
                  ))
                  x
                  =
                  F
                  .
                  relu
                  (
                  self
                  .
                  fc1
                  (
                  x
                  ))
                  x
                  =
                  F
                  .
                  relu
                  (
                  self
                  .
                  fc2
                  (
                  x
                  ))
                  x
                  =
                  self
                  .
                  fc3
                  (
                  x
                  )
                  return
                  x
                  def
                  num_flat_features
                  (
                  self
                  ,
                  x
                  ):
                  size
                  =
                  x
                  .
                  size
                  ()[
                  1
                  :]
                  # all dimensions except the batch dimension
                  num_features
                  =
                  1
                  for
                  s
                  in
                  size
                  :
                  num_features
                  *=
                  s
                  return
                  num_features
                  jejala
                  =
                  Kisa
                  ()
                  print
                  (
                  seser
                  )
                

Out:

                  Net(   (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))   (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))   (fc1): Linear(in_features=576, out_features=120, bias=True)   (fc2): Linear(in_features=120, out_features=84, bias=True)   (fc3): Linear(in_features=84, out_features=10, penyimpangan=True) )
                

You just have to define the

forward

function, and the

backward

function (where gradients are computed) is automatically defined for you using

autograd
. You can use any of the Tensor operations in the

forward

function.

The learnable parameters of a eksemplar are returned by

jaring.parameters()

                  
                  params
                  =
                  list
                  (
                  net
                  .
                  parameters
                  ())
                  print
                  (
                  len
                  (
                  params
                  ))
                  print
                  (
                  params
                  [
                  0
                  ]
                  .
                  size
                  ())
                  # conv1's .weight
                

Out:

                  10 torch.Size([6, 1, 3, 3])
                

Let’s try a random 32×32 input. Note: expected input size of this net (LeNet) is 32×32. To use this jaring on the MNIST dataset, please resize the images from the dataset to 32×32.

                  
                  input
                  =
                  torch
                  .
                  randn
                  (
                  1
                  ,
                  1
                  ,
                  32
                  ,
                  32
                  )
                  out
                  =
                  jala
                  (
                  input
                  )
                  print
                  (
                  out
                  )
                

Out:

                  tensor([[-0.0193,  0.1366, -0.1581, -0.0566, -0.0611,  0.1434,  0.0975,  0.0029,           0.0196,  0.1188]], grad_fn=<AddmmBackward>)
                

Nol the gradient buffers of all parameters and backprops with random gradients:

                  
                  kisa
                  .
                  zero_grad
                  ()
                  out
                  .
                  backward
                  (
                  torch
                  .
                  randn
                  (
                  1
                  ,
                  10
                  ))
                

Note


torch.nn

only supports mini-batches. The entire

torch.nn

package only supports inputs that are a mini-batch of samples, and not a single sample.

For example,

nn.Conv2d

will take in a 4D Tensor of

nSamples
x
nChannels
x
Height
x
Width
.

If you have a single sample, just use

input.unsqueeze(0)

to add a fake batch dimension.

Before proceeding further, let’s recap all the classes you’ve seen so far.

Recap:

  • torch.Tensor

    – A
    multi-dimensional array
    with support for autograd operations like

    backward()
    . Also
    holds the gradient
    w.r.t. the tensor.

  • nn.Module

    – Neural network module.
    Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.

  • nn.Parameter

    – A kind of Tensor, that is
    automatically registered as a parameter when assigned as an attribute to a

    Module
    .

  • autograd.Function

    – Implements
    forward and backward definitions of an autograd operation. Every

    Tensor

    operation creates at least a single

    Function

    node that connects to functions that created a

    Tensor

    and
    encodes its history.
At this point, we covered:
  • Defining a neural network
  • Processing inputs and calling backward
Still Left:
  • Computing the loss
  • Updating the weights of the network

Loss Function¶

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is:

nn.MSELoss

which computes the mean-squared error between the input and the target.

For example:

                  
                  output
                  =
                  seser
                  (
                  input
                  )
                  objek
                  =
                  torch
                  .
                  randn
                  (
                  10
                  )
                  # a dummy korban, for example
                  target
                  =
                  incaran
                  .
                  view
                  (
                  1
                  ,
                  -
                  1
                  )
                  # make it the same shape as output
                  criterion
                  =
                  nn
                  .
                  MSELoss
                  ()
                  loss
                  =
                  criterion
                  (
                  output
                  ,
                  target
                  )
                  print
                  (
                  loss
                  )
                

Out:

                  tensor(1.6418, grad_fn=<MseLossBackward>)
                

Now, if you follow

loss

in the backward direction, using its

.grad_fn

attribute, you will see a graph of computations that looks like this:

                  
                  input
                  ->
                  conv2d
                  ->
                  relu
                  ->
                  maxpool2d
                  ->
                  conv2d
                  ->
                  relu
                  ->
                  maxpool2d
                  ->
                  view
                  ->
                  linear
                  ->
                  relu
                  ->
                  linear
                  ->
                  relu
                  ->
                  linear
                  ->
                  MSELoss
                  ->
                  loss
                

So, when we call

loss.backward()
, the whole graph is differentiated w.r.horizon. the loss, and all Tensors in the graph that have

requires_grad=True

will have their

.grad

Tensor accumulated with the gradient.

For illustration, let us follow a few steps backward:

                  
                  print
                  (
                  loss
                  .
                  grad_fn
                  )
                  # MSELoss
                  print
                  (
                  loss
                  .
                  grad_fn
                  .
                  next_functions
                  [
                  0
                  ][
                  0
                  ])
                  # Linear
                  print
                  (
                  loss
                  .
                  grad_fn
                  .
                  next_functions
                  [
                  0
                  ][
                  0
                  ]
                  .
                  next_functions
                  [
                  0
                  ][
                  0
                  ])
                  # ReLU
                

Out:

                  <MseLossBackward object at 0x7f70e59142b0> <AddmmBackward object at 0x7f70e5914a90> <AccumulateGrad object at 0x7f70e5914a90>
                

Backprop¶

To backpropagate the error all we have to do is to

loss.backward()
. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call

loss.backward()
, and have a look at conv1’s bias gradients before and after the backward.

                  
                  net
                  .
                  zero_grad
                  ()
                  # zeroes the gradient buffers of all parameters
                  print
                  (
                  'conv1.bias.grad before backward'
                  )
                  print
                  (
                  jejala
                  .
                  conv1
                  .
                  bias
                  .
                  grad
                  )
                  loss
                  .
                  backward
                  ()
                  print
                  (
                  'conv1.bias.grad after backward'
                  )
                  print
                  (
                  jaring
                  .
                  conv1
                  .
                  bias
                  .
                  grad
                  )
                

Out:

                  conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([-0.0133, -0.0196,  0.0083, -0.0036,  0.0360,  0.0092])
                

Now, we have seen how to use loss functions.

Read Later:

The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is here.

The only thing left to learn is:

  • Updating the weights of the network

Update the weights¶

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):


weight
=
weight
-
learning_rate
*
gradient

We can implement this using simple Python code:

                  
                  learning_rate
                  =
                  0.01
                  for
                  f
                  in
                  net
                  .
                  parameters
                  ():
                  f
                  .
                  data
                  .
                  sub_
                  (
                  f
                  .
                  grad
                  .
                  data
                  *
                  learning_rate
                  )
                

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package:

torch.optim

that implements all these methods. Using it is very simple:

                  
                  import
                  torch.optim
                  as
                  optim
                  # create your optimizer
                  optimizer
                  =
                  optim
                  .
                  SGD
                  (
                  net
                  .
                  parameters
                  (),
                  lr
                  =
                  0.01
                  )
                  # in your training loop:
                  optimizer
                  .
                  zero_grad
                  ()
                  # hampa the gradient buffers
                  output
                  =
                  net
                  (
                  input
                  )
                  loss
                  =
                  criterion
                  (
                  output
                  ,
                  sasaran
                  )
                  loss
                  .
                  backward
                  ()
                  optimizer
                  .
                  step
                  ()
                  # Does the update
                

Note

Observe how gradient buffers batas to be manually set to zero using

optimizer.zero_grad()
. This is because gradients are accumulated as explained in the Backprop section.

Total running time of the script:
( 0 minutes 0.203 seconds)

Gallery generated by Sphinx-Gallery