life is too short for a diary




Sigmoid vs Softmax

Tags: ml python

Featured image for Sigmoid vs Softmax
Sigmoid and softmax are both functions that turn model scores into values that are easier to interpret, but they solve different kinds of classification problems.

Rule of thumb

Sigmoid = independent yes/no decisions.
Softmax = choose one class out of many.

Quick answer

Intuition

Formulas

Sigmoid

\[\sigma(x) = \frac{1}{1 + e^{-x}}\]

Softmax

\[\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}\]

Visualization

Key differences

Aspect Sigmoid Softmax
Input One score A vector of scores
Output One value in (0, 1) A probability distribution
Sum to 1? No Yes
Relationship between outputs Independent Dependent and competing
Best for Binary or multi-label classification Multi-class single-label classification
Example question "Is this email spam?" "Is this image a cat, dog, or horse?"

Tiny example

Suppose a model produces these raw scores:

[2.0, 1.0, 0.1]

If you apply sigmoid to each score independently, you get:

[0.881, 0.731, 0.525]

These values do not sum to 1, because each class is evaluated separately.

If you apply softmax, you get:

[0.659, 0.242, 0.099]

These values do sum to 1, so they can be interpreted as a probability distribution across classes.

Why this matters

If one softmax score goes up, the others must go down.
With sigmoid, multiple outputs can be high at the same time.

When to use which

Use sigmoid when

Use softmax when

Interview-ready answers

Common mistakes

Simple Python

import math


def sigmoid(x: float) -> float:
    return 1 / (1 + math.exp(-x))


def softmax(scores: list[float]) -> list[float]:
    max_score = max(scores)
    exp_scores = [math.exp(score - max_score) for score in scores]
    total = sum(exp_scores)
    return [score / total for score in exp_scores]

One-line memory trick


comments powered by Disqus