← Back to Glossary

Calibration

Calibration describes whether predicted probabilities match observed frequencies. A well calibrated forecaster’s 70% predictions come true about 70% of the time.

Definition

Calibration measures whether your predicted probabilities correspond to real world frequencies. If you forecast many events at 70% and about 70% of them happen, you are well calibrated in that range.

How it is checked

A common approach is to group forecasts into probability “bins” (for example 0.50 to 0.60, 0.60 to 0.70, etc.) and compare:

• average predicted probability in the bin

• realized frequency of outcomes in the bin

Why it matters

Forecasting is not just about being right. A forecaster who always predicts 60% may have decent Brier score, but may be underconfident if events in that bucket happen 80% of the time. Calibration tells you whether your probabilities are meaningful.

Calibration vs sharpness

Calibration is different from sharpness. You can be calibrated but uninformative if you always say 50%. Strong forecasters are both calibrated and sharp.

Common pitfalls

Small samples: Calibration curves can mislead when each bin has few events. Use more data or fewer bins.

Changing base rates: If the base rate shifts over time, calibration estimated on old data may not hold.

Related

Calibration is one component behind better Brier score and Brier skill score. For strength of predictions, see sharpness.