← Back to Guides

Proper Scoring Rules: Why Honest Probabilities Win

January 1, 2026 Basics

What a proper scoring rule is

A proper scoring rule is a scoring method where your best strategy is to report your true probability belief.

In other words, if you believe an event is 65% likely, you maximize your expected score by writing 0.65, not by shading it up or down to look confident.

Why “proper” matters

Without proper scoring, people can game the system by reporting probabilities that look good rather than probabilities that are accurate.

Proper scoring pushes the game toward:

• honest probabilities

• measurable calibration

• less incentive for performative confidence

Brier score is proper

Brier score uses squared error:

(p - o)^2

Where o is 1 for YES and 0 for NO.

If you think the true chance is 0.65, the expected squared error is minimized by reporting 0.65.

This is the core “proper” property.

Log loss is also proper

Log loss is based on the probability assigned to what happens:

• YES outcome: -log(p)

• NO outcome: -log(1 - p)

It is also proper, but it punishes extreme wrong calls much more than Brier.

Proper does not mean perfect

Proper scoring rules help, but they do not solve every leaderboard problem.

Selection bias still exists

If users choose which questions to forecast, they can still inflate results by cherry picking easy questions. You must track coverage and set minimum activity rules.

Timing can still dominate

If you score only the last update before settlement, you can reward late forecasting. Use evaluation checkpoints to score everyone at the same horizon.

How proper scoring interacts with calibration

Because Brier and log loss are proper, your best long run move is to fix your calibration instead of trying to look confident.

Typical patterns:

overconfidence gets punished hard, especially under log loss

underconfidence also costs you, because you fail to use signal

A practical intuition

Think of a probability as a claim about frequency.

If you say 70% often, then about 70% should happen. Proper scoring rules reward you when that is true and penalize you when it is not.

Common mistakes

Mistake: treating “proper” as “ungameable”

Proper means honest reporting is optimal. It does not stop cherry picking. That is why coverage and checkpoints matter.

Mistake: mixing metrics without methodology

If you publish Brier, BSS, and log loss, document how you score updates, how you handle missing forecasts, and whether you apply probability clipping.

Takeaway

Proper scoring rules are the foundation of fair forecasting evaluation because they reward honest probabilities. Brier and log loss are both proper, but they penalize mistakes differently. Use proper scoring with coverage and checkpoints to get scorecards that reflect real skill.

Related

Proper Scoring Rule

Brier Score

Log Loss

Log Loss vs Brier: Which One Punishes You More and Why

Selection Bias and Coverage

Evaluation Checkpoints