Eroxl's Notes
Gradient Descent with Momentum

Gradient descent with momentum is the process of adding momentum to the changes in the parameters of an algorithm performed by gradient descent this is done through introducing a momentum term to the standard equation.

This momentum term is included in the change of our parameters as follows

This term can then be used when updating the parameters of the algorithm

  • Definitions
    • is the current value of the parameters of the algorithm
    • is the learning rate of the algorithm
    • is the change in parameters of the algorithm during the iteration
    • is the gradient of the multivariable function that is being optimized with respect to the parameters
    • is the "decay factor" which controls how much the momentum influences the change in the gradient and is usually in the range
Note

When the decay factor () is 0 the algorithm becomes normal gradient descent as the momentum term is ignored.