Scorecard Methodology: What You Must Disclose

Why methodology disclosure matters

A forecasting scorecard looks simple: one number like Brier score or Brier skill score.

But without methodology, the number is meaningless because small choices can change it a lot.

If you want your scorecard to be credible, publish the rules that define the measurement.

The minimum disclosure checklist

At a minimum, disclose these items in plain language:

1) What gets scored

• event type: binary only, or multi class too

• eligible market set definition (categories, time range, exclusions)

• how you handle voids, disputes, and ambiguous outcomes

2) Which forecast gets scored

• do you score the final forecast before settlement

• or do you use an evaluation checkpoint rule (recommended)

• how you handle no forecast before the checkpoint (missing vs default fill)

3) Outcome definition

• settlement source and how outcomes are mapped to 0 or 1

• how you handle partial outcomes or cancellations

4) Benchmark definition (for BSS)

State which benchmark you use:

• 50/50

• base rate (recommended default)

• market consensus (only with liquidity rules)

If you use market consensus, disclose:

• last trade vs mid price vs VWAP

• consensus timestamp or window

• liquidity filters (spread cap, minimum volume, depth)

5) Sample size and coverage

Every headline score should display:

• sample size (N)

• coverage relative to eligibility

Otherwise you invite selection bias and fake skill.

6) Time handling

• time zone

• how you handle forecast timestamps

• if you use horizons, how you bucket forecast horizon

• whether you use rolling windows and what the window size is

7) Calculation details

• do you apply probability clipping (especially for log loss)

• how you treat duplicate forecasts, edits, or updates

• whether you weight forecasts equally or by something (and why)

What a good “Methodology” box looks like

A simple, readable example:

• Eligible set: all binary markets in Category A and B, open at least 24h

• Checkpoint: score the last forecast at T-24h

• Metric: Brier score (equal weight per market)

• Benchmark: base rate by category for BSS

• Reporting: show N and coverage, plus calibration table with 10 buckets

Common ways scorecards become misleading

• hiding low coverage

• scoring only final forecasts (rewards waiting)

• changing benchmark definitions over time

• using last trade as “market consensus” in thin markets

• not stating how voids and disputes are handled

Takeaway

If you want people to trust a forecasting scorecard, disclose the methodology. Eligibility, checkpoints, benchmark definition, and N plus coverage are the minimum. Without them, the score is not interpretable and comparisons are not fair.

• Evaluation Checkpoints

• Selection Bias and Coverage

• Benchmarking Against the Market

• Methodology