← Back to Guides

Brier Score Decomposition: Reliability, Resolution, Uncertainty

January 1, 2026 Skill and Baselines

Why decompose Brier score

Brier score is one number. It tells you how much squared error you had, but it does not tell you why.

Decomposition answers the practical question: is your score bad because you are miscalibrated, or because your forecasts do not separate cases well?

The decomposition in one line

For binary events, Brier score can be decomposed as:

BS = Reliability - Resolution + Uncertainty

Lower BS is better.

Component 1: reliability (calibration error)

Reliability is about calibration.

If you say 70% repeatedly, do those events happen about 70% of the time?

When reliability is poor, your score gets worse even if you have good signal, because you are mapping confidence incorrectly.

How to spot reliability problems

• calibration curve points are consistently above or below the diagonal

• clear patterns of overconfidence or underconfidence

• bucket means and realized frequencies are far apart in a calibration table

How to improve reliability

• use probability mapping (shrink or stretch) based on your buckets

• start from base rates and move away only with evidence

• avoid mixing very different categories or horizons in one calibration view

Component 2: resolution (discrimination)

Resolution measures how well you separate cases that resolve differently.

A forecaster with high resolution assigns different probabilities to different situations in a way that correlates with outcomes.

Intuition: if all your forecasts are around 0.50, resolution is low even if you are calibrated.

Resolution vs sharpness

Sharpness is how spread out your probabilities are. Resolution is whether that spread is justified by outcomes.

You can be sharp without resolution (bold but wrong). You can be calibrated without resolution (safe but uninformative).

How to improve resolution

• forecast in categories where you have real signal

• build consistent update rules so probabilities move with evidence

• track forecast distribution and make sure you are not stuck near 0.50

Component 3: uncertainty (base rate difficulty)

Uncertainty reflects how intrinsically hard the question set is based on its outcome mix.

When outcomes are near 50/50, uncertainty is high. When outcomes are very lopsided, uncertainty is lower.

This component is mostly driven by the dataset, not your forecasting.

Why uncertainty matters for comparisons

Because question sets differ, raw BS is not automatically comparable across datasets. This is one reason Brier skill score vs a benchmark is so useful.

How to use the decomposition on a scorecard

Use the decomposition like a diagnostic:

• If reliability is the problem, focus on calibration fixes (mapping, priors, bucket review).

• If resolution is the problem, focus on better differentiation (better signals, better category focus, better updates).

• If uncertainty dominates, do not overinterpret BS comparisons across different datasets.

Common mistakes

Treating BS as pure skill: BS mixes your behavior with dataset difficulty.

Fixing the wrong thing: if you are miscalibrated, being “sharper” can make your score worse.

Ignoring sample size: decomposition is unstable with small sample size.

Takeaway

Brier score can be explained as reliability minus resolution plus uncertainty. Reliability is calibration quality. Resolution is how well you separate cases. Uncertainty is dataset difficulty. Use the decomposition to target the right improvement, not just chase a lower headline score.

Related

Decomposition

Calibration

Sharpness

Brier Skill Score

Sharpness vs Calibration: Being Bold Without Being Wrong

Overconfidence and Underconfidence: How to Diagnose and Fix