Scoring Methodology
Our scoring is fully deterministic and openly documented. No "proprietary trade secrets" — just math you can verify.
Aggregate Risk Score
The aggregate risk score is a single integer from 0 to 100, computed from four components:
aggregate_risk_score = clamp(round(raw_points), 0, 100)
Given identical inputs and version stamps, this formula produces bit-for-bit identical output. Every response includes version stamps (heuristics_version, llm_extractor_version, driver_taxonomy_version) so you can verify reproducibility.
Platform Base Points
Different platforms have different dispute mechanisms and governance structures. We account for this with platform-specific base scores:
| Platform | Base Points | Rationale |
|---|---|---|
| Polymarket | 12 | UMA oracle system with dispute bonds; higher base risk |
| Kalshi | 8 | CFTC-regulated; centralized resolution |
Tier Boundaries
The aggregate score maps to four risk tiers:
Dispute Probability
We estimate the probability of a dispute being filed using the aggregate risk score plus platform and tier indicators:
Where:
I(polymarket)= 1 if the market is on Polymarket (UMA dispute system), 0 otherwiseI(CRITICAL)= 1 if the tier is CRITICAL, 0 otherwise- The result is clamped to [0, 0.90]
Example: a Polymarket market with score 67 (HIGH tier): 0.01 + 0.003 * 67 + 0.05 * 1 + 0.06 * 0 = 0.261
Settlement Delay Model
We model settlement delay as a lognormal distribution, which captures the right-skewed nature of resolution timelines (most markets resolve quickly, but some take much longer).
sigma = base_sigma * (1 + ambiguity_factor + spof_factor)
p50 = exp(mu) = median_hours
p90 = exp(mu + 1.282 * sigma)
p99 = exp(mu + 2.326 * sigma)
The sigma parameter is adjusted upward when ambiguity drivers or single-point-of-failure oracle dependencies are detected, widening the distribution to reflect greater uncertainty.
Driver Taxonomy
Our LLM extractor identifies up to 15 risk drivers per market from the resolution rules text. Each driver has:
- Type code — from a fixed taxonomy of 15 driver types
- Strength — LOW, MEDIUM, or HIGH
- Confidence — 0.0 to 1.0, reflecting extraction certainty
- Points contribution — derived from base_points * strength_multiplier * confidence
- Evidence span — exact character offsets into the canonicalized rules text
Drivers are sorted by points_contribution descending, with tiebreakers: higher confidence > higher strength > alphabetical by driver_type.
Determinism Guarantee
Every score response includes four version stamps:
heuristics_version— the scoring formula versionstat_model_version— always "none" in v1 (heuristic-only)llm_extractor_version— the extraction prompt versiondriver_taxonomy_version— the driver registry version
Given the same rules text and version stamps, the scoring pipeline produces identical output. This is critical for backtesting and audit trails.
Why We Publish Our Methodology
Other providers hide behind "proprietary trade secrets." We believe that transparency is essential for trust — especially when traders are making financial decisions based on our scores. You can verify every number we produce.
Test the scoring on your own resolution rules.