Open Methodology

Scoring Methodology

Our scoring is fully deterministic and openly documented. No "proprietary trade secrets" — just math you can verify.

Aggregate Risk Score

The aggregate risk score is a single integer from 0 to 100, computed from four components:

raw_points = BASE_POINTS(platform) + sum(driver_points) - mitigation + complexity
aggregate_risk_score = clamp(round(raw_points), 0, 100)

Given identical inputs and version stamps, this formula produces bit-for-bit identical output. Every response includes version stamps (heuristics_version, llm_extractor_version, driver_taxonomy_version) so you can verify reproducibility.

Platform Base Points

Different platforms have different dispute mechanisms and governance structures. We account for this with platform-specific base scores:

PlatformBase PointsRationale
Polymarket12UMA oracle system with dispute bonds; higher base risk
Kalshi8CFTC-regulated; centralized resolution

Tier Boundaries

The aggregate score maps to four risk tiers:

LOW
[0, 20)
MEDIUM
[20, 50)
HIGH
[50, 75)
CRITICAL
[75, 100]

Dispute Probability

We estimate the probability of a dispute being filed using the aggregate risk score plus platform and tier indicators:

p_dispute = clamp(0.01 + 0.003 * score + 0.05 * I(polymarket) + 0.06 * I(CRITICAL), 0, 0.90)

Where:

  • I(polymarket) = 1 if the market is on Polymarket (UMA dispute system), 0 otherwise
  • I(CRITICAL) = 1 if the tier is CRITICAL, 0 otherwise
  • The result is clamped to [0, 0.90]

Example: a Polymarket market with score 67 (HIGH tier): 0.01 + 0.003 * 67 + 0.05 * 1 + 0.06 * 0 = 0.261

Settlement Delay Model

We model settlement delay as a lognormal distribution, which captures the right-skewed nature of resolution timelines (most markets resolve quickly, but some take much longer).

mu = ln(median_hours)
sigma = base_sigma * (1 + ambiguity_factor + spof_factor)

p50 = exp(mu) = median_hours
p90 = exp(mu + 1.282 * sigma)
p99 = exp(mu + 2.326 * sigma)

The sigma parameter is adjusted upward when ambiguity drivers or single-point-of-failure oracle dependencies are detected, widening the distribution to reflect greater uncertainty.

Driver Taxonomy

Our LLM extractor identifies up to 15 risk drivers per market from the resolution rules text. Each driver has:

  • Type code — from a fixed taxonomy of 15 driver types
  • Strength — LOW, MEDIUM, or HIGH
  • Confidence — 0.0 to 1.0, reflecting extraction certainty
  • Points contribution — derived from base_points * strength_multiplier * confidence
  • Evidence span — exact character offsets into the canonicalized rules text

Drivers are sorted by points_contribution descending, with tiebreakers: higher confidence > higher strength > alphabetical by driver_type.

Determinism Guarantee

Every score response includes four version stamps:

  • heuristics_version — the scoring formula version
  • stat_model_version — always "none" in v1 (heuristic-only)
  • llm_extractor_version — the extraction prompt version
  • driver_taxonomy_version — the driver registry version

Given the same rules text and version stamps, the scoring pipeline produces identical output. This is critical for backtesting and audit trails.

Why We Publish Our Methodology

Other providers hide behind "proprietary trade secrets." We believe that transparency is essential for trust — especially when traders are making financial decisions based on our scores. You can verify every number we produce.

Test the scoring on your own resolution rules.