Multi Class Forecasts: Extending Brier Beyond Binary

Binary is the common case, but not the only one

Brier score is usually introduced for a binary event (YES or NO). But many forecasting problems have more than two outcomes:

• Team A win vs Draw vs Team B win

• which candidate wins (multiple candidates)

• which category occurs (multiple categories)

The same idea extends naturally: score squared error across a probability vector.

Multi class setup

Assume you have K possible outcomes. Your forecast is a probability vector:

p = (p1, p2, ..., pK)

Where:

• each pk is between 0 and 1

• the probabilities sum to 1

The outcome is represented as a one hot vector:

o = (o1, o2, ..., oK)

Where exactly one ok is 1 (the realized class) and all others are 0.

The multi class Brier formula

The most common definition is:

BS = sum((pk - ok)^2) for k = 1..K

That is squared error across all classes, added together.

Normalization: two common choices

You will see two conventions:

• Unnormalized: use the raw sum across K classes

• Normalized: divide by K (or sometimes K-1) so scores are more comparable across different K

This is a methodology choice. If you publish a scorecard, disclose which one you use. See Scorecard Methodology.

Worked example: three outcomes

Suppose a match has 3 outcomes:

• A win

• Draw

• B win

You forecast:

• pA = 0.50

• pD = 0.20

• pB = 0.30

The match result is B win, so:

• oA = 0

• oD = 0

• oB = 1

Compute squared errors:

• (0.50 - 0)^2 = 0.25

• (0.20 - 0)^2 = 0.04

• (0.30 - 1)^2 = 0.49

Sum:

• BS = 0.25 + 0.04 + 0.49 = 0.78

If you use divide by K normalization (K = 3):

• BS_normalized = 0.78 / 3 = 0.26

Does it reduce to the binary case

Yes. When K = 2, the multi class definition is consistent with the binary version, just expressed across two complementary probabilities.

Is multi class Brier still a proper scoring rule

Yes. Like the binary version, multi class Brier is a proper scoring rule. Your best strategy is still to report your true probability vector, not to exaggerate confidence for appearance.

Benchmarks for multi class Brier

Raw Brier values depend on the difficulty of the dataset. For fair comparisons, define a benchmark and compute skill against it.

Base rate benchmark

Use the base rate vector per category. Example for 3 way matches:

• base rates: (A win 0.46, Draw 0.26, B win 0.28)

Score that vector the same way and compute a skill score analogous to Brier skill score.

Market implied benchmark

If you have a liquid market or odds feed, you can convert odds to an implied probability vector and benchmark versus the market, but only if you handle vig and liquidity cleanly.

Common mistakes

Forgetting the sum to 1 constraint: multi class probabilities must sum to 1. If they do not, your scoring and calibration views become hard to interpret.

Not disclosing normalization: dividing by K changes the scale. Comparisons across platforms are meaningless without disclosure.

Comparing different K directly: a 3 class task and a 10 class task have different baseline difficulty. Use a benchmark and report skill, not just raw BS.

Takeaway

Multi class Brier is squared error across a probability vector. It is still proper, still interpretable, and works well for outcomes like win/draw/loss. The key is to disclose normalization and to benchmark against base rates or a clean market implied baseline so scorecards stay comparable.

• Brier Score

• Proper Scoring Rule

• Brier Score Calculator

• Scorecard Methodology