Sharpness vs Calibration: Being Bold Without Being Wrong
Two different goals
People often treat confidence as the goal. It is not.
Calibration asks: do your probabilities mean what they say?
Sharpness asks: do you make meaningful, differentiated forecasts, or do you live near 50%?
What sharpness is
Sharpness is about the distribution of your forecasts, not outcomes. A sharp forecaster uses a wide range of probabilities when evidence supports it.
One simple diagnostic is forecast distribution. If most of your forecasts are between 0.45 and 0.55, sharpness is low.
What calibration is
Calibration is about whether, in the long run, your probabilities match observed frequencies. It is measured with a calibration table and a calibration curve.
Why you need both
Calibrated but not sharp: If you always forecast 50%, you can be “perfectly calibrated” on average, but you are not informative.
Sharp but not calibrated: If you often forecast 90% and you are wrong too often, you will be punished heavily by Brier score and log loss.
How sharpness helps your score
When you are calibrated, sharper forecasts usually improve score because they reduce squared error on events where you have real signal.
This is one reason Brier score decomposition separates reliability (calibration) from resolution (your ability to separate cases).
How sharpness becomes dangerous
Sharpness becomes dangerous when it is not justified by evidence.
Two common failure modes:
• overconfidence in high probability buckets
• ignoring the base rate and jumping to extremes on thin evidence
Practical ways to increase sharpness safely
1) Start from base rates
Use the base rate as a prior, then move away from it gradually as evidence accumulates. This avoids extreme overreaction.
2) Use consistent evidence thresholds
Decide what evidence justifies moving from 0.55 to 0.65, or from 0.70 to 0.85. Make this rule based, not mood based.
3) Segment by category
Sharpness should differ by domain. Mixing unlike categories can make you look miscalibrated when you are actually mixing different regimes.
4) Review calibration per bucket
If your top buckets underperform, do not stop being sharp. Adjust the mapping. A simple fix is to compress probabilities toward the center (for example map 0.90 to 0.80) until the bucket becomes calibrated.
Common mistakes
Mistake: treating sharpness as “bravery”
Sharpness is not about being bold. It is about being specific when you have signal.
Mistake: using extremes for “good looking” picks
Extreme predictions look impressive but are costly when wrong, especially under log loss.
Mistake: confusing a market move with evidence
Following the market can be rational, but if you herd into consensus without your own model, you may reduce resolution and hide weaknesses. See Herding.
Takeaway
Calibration tells you whether your probabilities are honest. Sharpness tells you whether they are informative. The goal is to be sharp and calibrated, which usually means starting from base rates, moving in measured steps, and using calibration feedback to correct your mapping over time.
Related
• Calibration Explained: Why 70 Percent Should Mean 70 Percent