The learning rate in gradient descent represents the "step" size (the size of the update applied to the model parameters) after each iteration of the algorithm. The learning rate is usually denoted with the symbol .

Choosing the right learning rate is crucial in the performance of gradient descent, as it can greatly impact the speed and the stability of the algorithm. A learning rate that is too small will result in slow convergence, while a learning rate that is too large can cause the algorithm to overshoot or even diverge.

The learning rate can be any number but is typically a very small number like or but should be tweaked if the network is converging slowly or not converging at all.

Learning rates can also be changed on a specific time scale through learning rate scheduling or decreased slowly over time through learning rate decay.

In some gradient descent algorithms (usually called adaptive gradient descent algorithms) like AdaGrad or RMSProp the learning rate is varied depending on the current gradient.