Categorical Cross-Entropy Loss is a popular cost function used in multi-class classification problems. It calculates the error between the predicted probability distribution and the actual probability distribution of the classes.

The formula for categorical cross-entropy loss is:

Definitions
- denotes the number of categories
- denotes the category
- denotes the target value
- denotes the predicted values.

This can be simplified further when one hot encoding is used.

Where is the prediction of the correct class (in this example or the first output value)

In some algorithms it's possible for to be equal to 0 this can cause issues as is undefined to fix this the value is clipped to a range of , where is a very small value (for example ).

This means that the loss function can then be re-written formally as:

Derivative

Definitions
- denotes the number of categories
- denotes the target value
- denotes the predicted values.

Note

Again a small value should be added to to prevent divide by zero issues