Scorecard Methodology: What You Must Disclose
Why methodology disclosure matters
A forecasting scorecard looks simple: one number like Brier score or Brier skill score.
But without methodology, the number is meaningless because small choices can change it a lot.
If you want your scorecard to be credible, publish the rules that define the measurement.
The minimum disclosure checklist
At a minimum, disclose these items in plain language:
1) What gets scored
• event type: binary only, or multi class too
• eligible market set definition (categories, time range, exclusions)
• how you handle voids, disputes, and ambiguous outcomes
2) Which forecast gets scored
• do you score the final forecast before settlement
• or do you use an evaluation checkpoint rule (recommended)
• how you handle no forecast before the checkpoint (missing vs default fill)
3) Outcome definition
• settlement source and how outcomes are mapped to 0 or 1
• how you handle partial outcomes or cancellations
4) Benchmark definition (for BSS)
State which benchmark you use:
• 50/50
• base rate (recommended default)
• market consensus (only with liquidity rules)
If you use market consensus, disclose:
• last trade vs mid price vs VWAP
• consensus timestamp or window
• liquidity filters (spread cap, minimum volume, depth)
5) Sample size and coverage
Every headline score should display:
• sample size (N)
• coverage relative to eligibility
Otherwise you invite selection bias and fake skill.
6) Time handling
• time zone
• how you handle forecast timestamps
• if you use horizons, how you bucket forecast horizon
• whether you use rolling windows and what the window size is
7) Calculation details
• do you apply probability clipping (especially for log loss)
• how you treat duplicate forecasts, edits, or updates
• whether you weight forecasts equally or by something (and why)
What a good “Methodology” box looks like
A simple, readable example:
• Eligible set: all binary markets in Category A and B, open at least 24h
• Checkpoint: score the last forecast at T-24h
• Metric: Brier score (equal weight per market)
• Benchmark: base rate by category for BSS
• Reporting: show N and coverage, plus calibration table with 10 buckets
Common ways scorecards become misleading
• hiding low coverage
• scoring only final forecasts (rewards waiting)
• changing benchmark definitions over time
• using last trade as “market consensus” in thin markets
• not stating how voids and disputes are handled
Takeaway
If you want people to trust a forecasting scorecard, disclose the methodology. Eligibility, checkpoints, benchmark definition, and N plus coverage are the minimum. Without them, the score is not interpretable and comparisons are not fair.