← Back to Glossary

Sample Size

Sample size (N) is the number of scored forecasts. Larger samples make scores more stable and reduce noise in calibration buckets and comparisons.

Definition

Sample size (often shown as N) is the number of forecasts included in a score calculation.

Why it matters

With small N, scores can swing a lot due to randomness. This is especially true for:

calibration analysis across probability buckets

• comparisons using Brier skill score

• rolling metrics like a rolling window

Practical guidance

• Always show N on scorecards.

• If buckets have very low counts, use fewer buckets or aggregate over longer windows.

• Compare forecasters only when their N is in the same ballpark, or report uncertainty.

Related

Sample size is closely related to confidence intervals and to evaluation integrity topics like selection bias.