## Neural Networks¶

Neural networks can be constructed using the
``` torch.nn ```
package.

Now that you had a glimpse of
``` autograd ```,
``` nn ```
depends on
``` autograd ```
to define models and differentiate them. An
``` nn.Module ```
contains layers, and a method
``` forward(input) ```
that returns the
``` output ```.

For example, look at this network that classifies digit images: convnet

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

• Define the neural network that has some learnable parameters (or weights)
• Iterate over a dataset of inputs
• Process input through the network
• Compute the loss (how far is the output from being correct)
• Propagate gradients back into the network’s parameters
• Update the weights of the network, typically using a simple update rule:
``` weight = weight - learning_rate * gradient ```

## Define the network¶

Let’s define this network:

```
import
torch
import
torch.nn
as
nn
import
torch.nn.functional
as
F
class
Net
(
nn
.
Module
):
def
__init__
(
self
):
super
(
Jejala
,
self
)
.
__init__
()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self
.
conv1
=
nn
.
Conv2d
(
1
,
6
,
3
)
self
.
conv2
=
nn
.
Conv2d
(
6
,
16
,
3
)
# an affine operation: y = Wx + b
self
.
fc1
=
nn
.
Linear
(
16
*
6
*
6
,
120
)
# 6*6 from image dimension
self
.
fc2
=
nn
.
Linear
(
120
,
84
)
self
.
fc3
=
nn
.
Linear
(
84
,
10
)
def
forward
(
self
,
x
):
# Max pooling over a (2, 2) window
x
=
F
.
max_pool2d
(
F
.
relu
(
self
.
conv1
(
x
)),
(
2
,
2
))
# If the size is a square you can only specify a single number
x
=
F
.
max_pool2d
(
F
.
relu
(
self
.
conv2
(
x
)),
2
)
x
=
x
.
view
(
-
1
,
self
.
num_flat_features
(
x
))
x
=
F
.
relu
(
self
.
fc1
(
x
))
x
=
F
.
relu
(
self
.
fc2
(
x
))
x
=
self
.
fc3
(
x
)
return
x
def
num_flat_features
(
self
,
x
):
size
=
x
.
size
()[
1
:]
# all dimensions except the batch dimension
num_features
=
1
for
s
in
size
:
num_features
*=
s
return
num_features
jejala
=
Kisa
()
print
(
seser
)
```

Out:

```                  Net(   (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))   (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))   (fc1): Linear(in_features=576, out_features=120, bias=True)   (fc2): Linear(in_features=120, out_features=84, bias=True)   (fc3): Linear(in_features=84, out_features=10, penyimpangan=True) )
```

You just have to define the
``` forward ```
function, and the
``` backward ```
function (where gradients are computed) is automatically defined for you using
``` autograd ```. You can use any of the Tensor operations in the
``` forward ```
function.

The learnable parameters of a eksemplar are returned by
``` jaring.parameters() ```

```
params
=
list
(
net
.
parameters
())
print
(
len
(
params
))
print
(
params
[
0
]
.
size
())
# conv1's .weight
```

Out:

```                  10 torch.Size([6, 1, 3, 3])
```

Let’s try a random 32×32 input. Note: expected input size of this net (LeNet) is 32×32. To use this jaring on the MNIST dataset, please resize the images from the dataset to 32×32.

```
input
=
torch
.
randn
(
1
,
1
,
32
,
32
)
out
=
jala
(
input
)
print
(
out
)
```

Out:

```                  tensor([[-0.0193,  0.1366, -0.1581, -0.0566, -0.0611,  0.1434,  0.0975,  0.0029,           0.0196,  0.1188]], grad_fn=<AddmmBackward>)
```

Nol the gradient buffers of all parameters and backprops with random gradients:

```
kisa
.
()
out
.
backward
(
torch
.
randn
(
1
,
10
))
```

Note

``` torch.nn ```
only supports mini-batches. The entire
``` torch.nn ```
package only supports inputs that are a mini-batch of samples, and not a single sample.

For example,
``` nn.Conv2d ```
will take in a 4D Tensor of
``` nSamples x nChannels x Height x Width ```.

If you have a single sample, just use
``` input.unsqueeze(0) ```
to add a fake batch dimension.

Before proceeding further, let’s recap all the classes you’ve seen so far.

Recap:
• ``` torch.Tensor ```
– A
multi-dimensional array
with support for autograd operations like
``` backward() ```. Also
w.r.t. the tensor.
• ``` nn.Module ```
– Neural network module.
Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
• ``` nn.Parameter ```
– A kind of Tensor, that is
automatically registered as a parameter when assigned as an attribute to a
``` Module ```.
• ``` autograd.Function ```
– Implements
forward and backward definitions of an autograd operation. Every
``` Tensor ```
operation creates at least a single
``` Function ```
node that connects to functions that created a
``` Tensor ```
and
encodes its history.
At this point, we covered:
• Defining a neural network
• Processing inputs and calling backward
Still Left:
• Computing the loss
• Updating the weights of the network

## Loss Function¶

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is:
``` nn.MSELoss ```
which computes the mean-squared error between the input and the target.

For example:

```
output
=
seser
(
input
)
objek
=
torch
.
randn
(
10
)
# a dummy korban, for example
target
=
incaran
.
view
(
1
,
-
1
)
# make it the same shape as output
criterion
=
nn
.
MSELoss
()
loss
=
criterion
(
output
,
target
)
print
(
loss
)
```

Out:

```                  tensor(1.6418, grad_fn=<MseLossBackward>)
```

Now, if you follow
``` loss ```
in the backward direction, using its
``` .grad_fn ```
attribute, you will see a graph of computations that looks like this:

```
input
->
conv2d
->
relu
->
maxpool2d
->
conv2d
->
relu
->
maxpool2d
->
view
->
linear
->
relu
->
linear
->
relu
->
linear
->
MSELoss
->
loss
```

So, when we call
``` loss.backward() ```, the whole graph is differentiated w.r.horizon. the loss, and all Tensors in the graph that have
``` requires_grad=True ```
will have their
``` .grad ```
Tensor accumulated with the gradient.

For illustration, let us follow a few steps backward:

```
print
(
loss
.
)
# MSELoss
print
(
loss
.
.
next_functions
[
0
][
0
])
# Linear
print
(
loss
.
.
next_functions
[
0
][
0
]
.
next_functions
[
0
][
0
])
# ReLU
```

Out:

```                  <MseLossBackward object at 0x7f70e59142b0> <AddmmBackward object at 0x7f70e5914a90> <AccumulateGrad object at 0x7f70e5914a90>
```

## Backprop¶

To backpropagate the error all we have to do is to
``` loss.backward() ```. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call
``` loss.backward() ```, and have a look at conv1’s bias gradients before and after the backward.

```
net
.
()
# zeroes the gradient buffers of all parameters
print
(
)
print
(
jejala
.
conv1
.
bias
.
)
loss
.
backward
()
print
(
)
print
(
jaring
.
conv1
.
bias
.
)
```

Out:

```                  conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([-0.0133, -0.0196,  0.0083, -0.0036,  0.0360,  0.0092])
```

Now, we have seen how to use loss functions.

The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is here.

The only thing left to learn is:

• Updating the weights of the network

## Update the weights¶

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

``` weight = weight - learning_rate * gradient ```

We can implement this using simple Python code:

```
learning_rate
=
0.01
for
f
in
net
.
parameters
():
f
.
data
.
sub_
(
f
.
.
data
*
learning_rate
)
```

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package:
``` torch.optim ```
that implements all these methods. Using it is very simple:

```
import
torch.optim
as
optim
# create your optimizer
optimizer
=
optim
.
SGD
(
net
.
parameters
(),
lr
=
0.01
)
# in your training loop:
optimizer
.
()
# hampa the gradient buffers
output
=
net
(
input
)
loss
=
criterion
(
output
,
sasaran
)
loss
.
backward
()
optimizer
.
step
()
# Does the update
```

Note

Observe how gradient buffers batas to be manually set to zero using
``` optimizer.zero_grad() ```. This is because gradients are accumulated as explained in the Backprop section.

Total running time of the script:
( 0 minutes 0.203 seconds)

Gallery generated by Sphinx-Gallery