Brier Skill Score
Brier skill score (BSS) measures how much better (or worse) your Brier score is versus a baseline forecast. Higher is better: 1 is perfect, 0 matches the baseline, and negative is worse than baseline.
Definition
Brier skill score (BSS) expresses forecasting performance relative to a benchmark. It turns raw Brier score into a “skill” metric by comparing your error to a baseline forecast.
Formula
BSS = 1 - (BS / BS_baseline)
Where BS is your Brier score and BS_baseline is the Brier score of the chosen benchmark.
How to interpret it
• 1.00 means perfect forecasting (BS = 0).
• 0.00 means you are exactly as good as the baseline.
• Negative means you are worse than the baseline.
Choosing a baseline
Common baselines include:
• 50/50 for all questions (simple but often unrealistic).
• Base rate (“climatology”): use the empirical base rate of outcomes for the dataset.
• A platform consensus forecast (for market-based evaluation) such as market consensus.
Why it matters
Brier score is sensitive to the mix of questions you forecast. BSS makes results more comparable across datasets by anchoring to a benchmark, which is especially useful for leaderboards and long running evaluation programs.
Related
To understand what raw BS is measuring, see Brier score. For calibration checks, see calibration and sharpness.