Eroxl's Notes
Adam
aliases
Adaptive Moment Estimation

Adam is an extension of RMSProp that implements some of the features of gradient descent with momentum.

The momentum is defined as

  • Definitions
    • is the current value of the parameters of the algorithm
    • is the change in parameters of the algorithm during the iteration
    • is the gradient of the multivariable function that is being optimized with respect to the parameters
    • is the "decay factor" which controls how much the momentum influences the change in the gradient and is usually in the range

The averaging is then defined as

  • Definitions
    • is the current iteration
    • is the gradient of the multivariable function that is being optimized with respect to the parameters
    • is the "forgetting factor" which controls how much past gradients influence the current learning rate and is usually in the range

This and change in is then used to calculate the new parameters of the algorithm

  • Definitions
    • is the current value of the parameters of the algorithm
    • is the base learning rate of the algorithm
Note

It's possible for the denominator to be 0 so usually a small value is added (for example ).

This means that the parameter update would be re-written formally as