Sample Size
Sample size (N) is the number of scored forecasts. Larger samples make scores more stable and reduce noise in calibration buckets and comparisons.
Definition
Sample size (often shown as N) is the number of forecasts included in a score calculation.
Why it matters
With small N, scores can swing a lot due to randomness. This is especially true for:
• calibration analysis across probability buckets
• comparisons using Brier skill score
• rolling metrics like a rolling window
Practical guidance
• Always show N on scorecards.
• If buckets have very low counts, use fewer buckets or aggregate over longer windows.
• Compare forecasters only when their N is in the same ballpark, or report uncertainty.
Related
Sample size is closely related to confidence intervals and to evaluation integrity topics like selection bias.