RMSProp is a variation of gradient descent that has an individual learning rate for each parameter (similarly to AdaGrad). This variation helps normalize parameter updates as some parameters might rise significantly while others will only rise slightly. RMSProp does this by taking dividing the starting learning rate by a running average of the past learning rates.

Definitions
- is the current iteration
- is the gradient of the multivariable function that is being optimized with respect to the parameters
- is the "forgetting factor" which controls how much past gradients influence the current learning rate and is usually in the range

This is then used to calculate the new parameters of the algorithm

Definitions
- is the current value of the parameters of the algorithm
- is the base learning rate of the algorithm
- is the gradient of the multivariable function that is being optimized with respect to the parameters

Note

It's possible for the denominator to be 0 so usually a small value is added (for example ).

This means that the parameter update would be re-written formally as