The softmax function takes as input a vector of numbers, and normalizes it into a probability distribution consisting of probabilities proportional to the exponentials of the input numbers. This operation has the effect of scaling all the entries in to the range and causing the sum of the vector to equal 1. The softmax function is usually denoted with the symbol .
The softmax function can be defined as follows:
The softmax on a single element can be defined as follows:
- Definitions
- is the element at the index of the input vector .
- is the total quantity of numbers in the input vector .
Derivative
Because the softmax function is a vector function as it maps the derivative is a Jacobian matrix.
The partial derivative of can be defined as
This can be re-written by setting the result of to be equal to making the partial derivative
This makes the final Jacobian matrix for the softmax function