Understanding PyTorch Autograd
PyTorch autograd is a masterpiece. It diminishes the burden of calculating multiple partial derivatives aka gradients.
torch.autograd
provides classes and functions implementing automatic differentiation of arbitrary scalar-valued functions.
Autograd is PyTorch’s automatic differentiation package. It deals with the automatic computation of gradients for computation graphs.
Why we need autograd??
Calculating derivatives is the core of training deep neural networks. The calculations may be simple, but working out by hand can be tedious and prone to error. As the model becomes more complex, it becomes impractical to calculate the gradients of each and every function. PyTorch’s autograd takes all of the tedious work from our hand.
Let’s understand autograd:
import torch
# create input tensors
a = torch.tensor([5.], requires_grad=True) #Only Tensors of floating point and complex dtype can require gradients
b = torch.tensor([6.], requires_grad=True)
a, b
(tensor([5.], requires_grad=True), tensor([6.], requires_grad=True))
By setting the flag requires_grad=True
, PyTorch will automatically build a computation graph in the background. It means that autograd accumulates the history of the computation on the tensors.
Let’s compute a function y.
y = a**3 - b**2
y
tensor([89.], grad_fn=<SubBackward0>)
What is the gradient of y with respect to a and b?
$\begin{aligned} & d y / d a=3 a^2=75 \ & d y / d b=-2 b=-12\end{aligned}$
PyTorch can easily find the gradients.
print(a.grad)
print(b.grad)
None
None
It outputs None
. This is because we haven’t initialized .backward()
method.
Tensor.backward()
– Computes the gradient of current tensor w.r.t. graph leaves.
So only when y.backward()
is called, the gradients backpropagate to the leaf tensors.
y.backward()
we can now access the gradients using the grad
attribute.
a.grad
tensor([75.])
b.grad
tensor([-12.])