← Back to Glossary

Brier Skill Score

Brier skill score (BSS) measures how much better (or worse) your Brier score is versus a baseline forecast. Higher is better: 1 is perfect, 0 matches the baseline, and negative is worse than baseline.

Definition

Brier skill score (BSS) expresses forecasting performance relative to a benchmark. It turns raw Brier score into a “skill” metric by comparing your error to a baseline forecast.

Formula

BSS = 1 - (BS / BS_baseline)

Where BS is your Brier score and BS_baseline is the Brier score of the chosen benchmark.

How to interpret it

1.00 means perfect forecasting (BS = 0).

0.00 means you are exactly as good as the baseline.

Negative means you are worse than the baseline.

Choosing a baseline

Common baselines include:

50/50 for all questions (simple but often unrealistic).

Base rate (“climatology”): use the empirical base rate of outcomes for the dataset.

• A platform consensus forecast (for market-based evaluation) such as market consensus.

Why it matters

Brier score is sensitive to the mix of questions you forecast. BSS makes results more comparable across datasets by anchoring to a benchmark, which is especially useful for leaderboards and long running evaluation programs.

Related

To understand what raw BS is measuring, see Brier score. For calibration checks, see calibration and sharpness.