Neural networks can be constructed using the
Now that you had a glimpse of
to define models and differentiate them. An
contains layers, and a method
that returns the
For example, look at this network that classifies digit images:
It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.
A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
Define the network¶
Let’s define this network:
import torch import torch.nn as nn import torch.nn.functional as F class Net ( nn . Module ): def __init__ ( self ): super ( Jejala , self ) . __init__ () # 1 input image channel, 6 output channels, 3x3 square convolution # kernel self . conv1 = nn . Conv2d ( 1 , 6 , 3 ) self . conv2 = nn . Conv2d ( 6 , 16 , 3 ) # an affine operation: y = Wx + b self . fc1 = nn . Linear ( 16 * 6 * 6 , 120 ) # 6*6 from image dimension self . fc2 = nn . Linear ( 120 , 84 ) self . fc3 = nn . Linear ( 84 , 10 ) def forward ( self , x ): # Max pooling over a (2, 2) window x = F . max_pool2d ( F . relu ( self . conv1 ( x )), ( 2 , 2 )) # If the size is a square you can only specify a single number x = F . max_pool2d ( F . relu ( self . conv2 ( x )), 2 ) x = x . view ( - 1 , self . num_flat_features ( x )) x = F . relu ( self . fc1 ( x )) x = F . relu ( self . fc2 ( x )) x = self . fc3 ( x ) return x def num_flat_features ( self , x ): size = x . size ()[ 1 :] # all dimensions except the batch dimension num_features = 1 for s in size : num_features *= s return num_features jejala = Kisa () print ( seser )
Net( (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1)) (fc1): Linear(in_features=576, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, penyimpangan=True) )
You just have to define the
function, and the
function (where gradients are computed) is automatically defined for you using
. You can use any of the Tensor operations in the
The learnable parameters of a eksemplar are returned by
params = list ( net . parameters ()) print ( len ( params )) print ( params [ 0 ] . size ()) # conv1's .weight
10 torch.Size([6, 1, 3, 3])
Let’s try a random 32×32 input. Note: expected input size of this net (LeNet) is 32×32. To use this jaring on the MNIST dataset, please resize the images from the dataset to 32×32.
input = torch . randn ( 1 , 1 , 32 , 32 ) out = jala ( input ) print ( out )
tensor([[-0.0193, 0.1366, -0.1581, -0.0566, -0.0611, 0.1434, 0.0975, 0.0029, 0.0196, 0.1188]], grad_fn=<AddmmBackward>)
Nol the gradient buffers of all parameters and backprops with random gradients:
kisa . zero_grad () out . backward ( torch . randn ( 1 , 10 ))
only supports mini-batches. The entire
package only supports inputs that are a mini-batch of samples, and not a single sample.
will take in a 4D Tensor of
If you have a single sample, just use
to add a fake batch dimension.
Before proceeding further, let’s recap all the classes you’ve seen so far.
with support for autograd operations like
holds the gradient
w.r.t. the tensor.
– Neural network module.
Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
– A kind of Tensor, that is
automatically registered as a parameter when assigned as an attribute to a
forward and backward definitions of an autograd operation. Every
operation creates at least a single
node that connects to functions that created a
encodes its history.
- At this point, we covered:
- Defining a neural network
- Processing inputs and calling backward
- Still Left:
- Computing the loss
- Updating the weights of the network
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
There are several different loss functions under the nn package . A simple loss is:
which computes the mean-squared error between the input and the target.
output = seser ( input ) objek = torch . randn ( 10 ) # a dummy korban, for example target = incaran . view ( 1 , - 1 ) # make it the same shape as output criterion = nn . MSELoss () loss = criterion ( output , target ) print ( loss )
Now, if you follow
in the backward direction, using its
attribute, you will see a graph of computations that looks like this:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
So, when we call
, the whole graph is differentiated w.r.horizon. the loss, and all Tensors in the graph that have
will have their
Tensor accumulated with the gradient.
For illustration, let us follow a few steps backward:
print ( loss . grad_fn ) # MSELoss print ( loss . grad_fn . next_functions [ 0 ][ 0 ]) # Linear print ( loss . grad_fn . next_functions [ 0 ][ 0 ] . next_functions [ 0 ][ 0 ]) # ReLU
<MseLossBackward object at 0x7f70e59142b0> <AddmmBackward object at 0x7f70e5914a90> <AccumulateGrad object at 0x7f70e5914a90>
To backpropagate the error all we have to do is to
. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.
Now we shall call
, and have a look at conv1’s bias gradients before and after the backward.
net . zero_grad () # zeroes the gradient buffers of all parameters print ( 'conv1.bias.grad before backward' ) print ( jejala . conv1 . bias . grad ) loss . backward () print ( 'conv1.bias.grad after backward' ) print ( jaring . conv1 . bias . grad )
conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([-0.0133, -0.0196, 0.0083, -0.0036, 0.0360, 0.0092])
Now, we have seen how to use loss functions.
The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is here.
The only thing left to learn is:
- Updating the weights of the network
Update the weights¶
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
We can implement this using simple Python code:
learning_rate = 0.01 for f in net . parameters (): f . data . sub_ ( f . grad . data * learning_rate )
However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package:
that implements all these methods. Using it is very simple:
import torch.optim as optim # create your optimizer optimizer = optim . SGD ( net . parameters (), lr = 0.01 ) # in your training loop: optimizer . zero_grad () # hampa the gradient buffers output = net ( input ) loss = criterion ( output , sasaran ) loss . backward () optimizer . step () # Does the update
Observe how gradient buffers batas to be manually set to zero using
. This is because gradients are accumulated as explained in the Backprop section.
Total running time of the script:
( 0 minutes 0.203 seconds)
Gallery generated by Sphinx-Gallery