Manjeet Dahiya

Gradient Descent Variants

This post presents variants of gradients descent algorithm.

Standard gradient descent

Standard gradient descent is also called batch gradient descent. It updates the parameters using the complete training dataset in one go. It computes the gradient by using all the training examples and then updates the parameters (called back propagation in case of deep learning). This constitutes an epoch of the training.

Let us formally describe the same. Given a loss function $L$ and the ith training example $X_i$ (input variables/features), the loss is written as $\dfrac{1}{m} \sum \limits_{i=0}^m L(X_i)$. Here $m$ is the number of training samples.

The gradient descent parameter update equation is:

\[W_t = W_{t-1} - \eta \nabla (\dfrac{1}{m}\sum \limits_{i=0}^m L(X_i))\]

Stochastic gradient descent

\[W_t = W_{t-1} - \eta \nabla L(X_i)\]

Mini-batch gradient descent

\[W_t = W_{t-1} - \eta \nabla (\dfrac{1}{b}\sum \limits_{i=0}^b L(X_i))\]

© 2018-19 Manjeet Dahiya