# Manjeet Dahiya

This post presents variants of gradients descent algorithm.

Standard gradient descent is also called batch gradient descent. It updates the parameters using the complete training dataset in one go. It computes the gradient by using all the training examples and then updates the parameters (called back propagation in case of deep learning). This constitutes an epoch of the training.

Let us formally describe the same. Given a loss function $L$ and the ith training example $X_i$ (input variables/features), the loss is written as $\dfrac{1}{m} \sum \limits_{i=0}^m L(X_i)$. Here $m$ is the number of training samples.

The gradient descent parameter update equation is:

• $W_t$ is the parameters at epoch $t$.
• $\eta$ is the learning rate.
• The number of changes in parameters per epoch is 1.

• $b$ is the mini batch size.