Dispute Probability Calibration: Sizing Positions Around Resolution Risk
Executive Summary
SettleRisk returns a scalar p_dispute for every market — the modeled probability that resolution will be contested through a formal dispute process. That number is easy to display in a UI and hard to act on. A 0.18 dispute probability on a market trading at 70 cents does not, by itself, tell a market maker how much capital to commit. This post walks through a closed-form sizing framework that takes p_dispute, the tier, and the expected delay distribution and produces a single number: maximum exposure as a percentage of book.
The model is intentionally conservative. It deliberately leaves easy money on the table on LOW-tier markets to avoid blowing up on CRITICAL ones, because the historical loss asymmetry favors that trade.
Core Concept
p_dispute is a calibrated tail-probability, not a directional signal. It tells you how often markets that look like this one end up in a dispute window — it tells you nothing about which side wins or loses. The right way to use it is as a haircut on Kelly-style sizing, not as a directional prior.
The closed-form formula:
max_exposure_pct = base_pct * (1 - p_dispute) * delay_haircut(tier) * platform_factor
Where:
base_pctis your unhaircut maximum book share for this strategy (e.g. 2.0% for a tight-quoting MM book)p_disputeis the SettleRisk output, clamped to[0.01, 0.90]delay_haircut(tier)shrinks exposure when expected settlement delays widen — see the table belowplatform_factoradjusts for venue-specific recovery rates (Polymarket=0.85, Kalshi=1.00)
The intuition: (1 - p_dispute) is the probability of a clean settle. delay_haircut prices the carrying cost of capital you cannot get out. platform_factor reflects the empirical observation that disputed Polymarket resolutions cost more in opportunity terms than disputed Kalshi ones, even before accounting for capital lockup.
Worked Example
A Polymarket market on "Will US-China tariffs exceed 25% by June 30, 2026?" is quoted at 68/72 cents. SettleRisk returns:
| Field | Value |
|-------|-------|
| aggregate_risk_score | 67 |
| tier | HIGH |
| p_dispute | 0.247 |
| delay_p50_hours | 96 |
| delay_p90_hours | 312 |
A 2% base exposure cap with the closed-form rule:
from settlerisk import SettleRiskClient
client = SettleRiskClient(api_key="sk-...")
score = client.get_risk_score("polymarket", "0xtariffmarket")
DELAY_HAIRCUTS = {"LOW": 1.00, "MEDIUM": 0.80, "HIGH": 0.55, "CRITICAL": 0.25}
PLATFORM_FACTOR = {"polymarket": 0.85, "kalshi": 1.00}
base_pct = 0.02
exposure = (
base_pct
* (1 - score.p_dispute)
* DELAY_HAIRCUTS[score.tier]
* PLATFORM_FACTOR[score.platform]
)
print(f"Max exposure: {exposure:.4%} of book")
# Max exposure: 0.7044% of book
The same calculation in TypeScript:
import { SettleRiskClient } from "settlerisk";
const client = new SettleRiskClient({ apiKey: "sk-..." });
const score = await client.getRiskScore("polymarket", "0xtariffmarket");
const DELAY_HAIRCUTS = { LOW: 1.0, MEDIUM: 0.8, HIGH: 0.55, CRITICAL: 0.25 };
const PLATFORM_FACTOR = { polymarket: 0.85, kalshi: 1.0 };
const exposure =
0.02 *
(1 - score.p_dispute) *
DELAY_HAIRCUTS[score.tier] *
PLATFORM_FACTOR[score.platform];
console.log(`Max exposure: ${(exposure * 100).toFixed(4)}% of book`);
// Max exposure: 0.7044% of book
On a $10M book, that translates to a hard cap of about $70,440 of net exposure on this single market. A market maker quoting two-sided might still cycle 5-10x that in gross volume per day, but they should not be net-long or net-short that market beyond the cap.
Implementation Notes
The haircut table looks aggressive and that is on purpose. The empirical loss distribution across disputed markets has a long right tail dominated by a handful of CRITICAL-tier blowups. The 0.25 multiplier on CRITICAL is a deliberate choice to make sure those blowups can never put more than ~15% of the base cap at risk before the position is unwound.
Cache the score, not the exposure. p_dispute updates whenever rules change or new disputes get logged on a similar market, and you want your sizing to recompute on every refresh of the score snapshot. Use the score.version block to detect when a refresh has occurred:
| Field | Cache strategy |
|-------|----------------|
| score.aggregate_risk_score | Cache 15min |
| score.p_dispute | Cache 15min |
| score.expected_delay | Cache 1h |
| score.version | Always check |
| exposure_pct (computed) | Never cache — recompute every quote |
For high-frequency strategies, subscribe to the score.tier_changed webhook so a tier change triggers immediate re-sizing rather than waiting for a 15-minute cache TTL.
See the full driver schema in our methodology docs, and the underlying scoring formula for the components that combine into aggregate_risk_score.
Failure Modes
1. Treating p_dispute as a directional signal. A high p_dispute does not mean "fade the market" or "back the YES side." It is a tail probability for resolution failure. Operators who use it as a directional input invariably overfit to a handful of historical cases.
2. Compounding haircuts incorrectly. The formula above multiplies four factors. Some teams add them or take the worst of each, which produces sizes that are either too aggressive or unrelated to the underlying math.
3. Ignoring the version stamps. The same market scored under heuristics_version=1.0.0 and heuristics_version=1.1.0 can produce a meaningfully different p_dispute. If your sizing logic does not record which version produced the score, you cannot reconstruct why a position was sized the way it was during a post-mortem.
4. Stale tier on rules changes. When market rules get edited mid-life — common for political markets near elections — the entire score block invalidates. Subscribe to rules.changed and force a refresh; do not lean on TTL alone.
5. Skipping the platform_factor. Polymarket disputes resolve through UMA's optimistic oracle with bond-and-challenge economics. Kalshi disputes resolve through CFTC-supervised internal processes. The recovery and opportunity cost profiles are different enough that one shared factor flattens out a real edge.
Checklist
- [ ] Pull
p_dispute,tier, andplatformfrom a fresh score snapshot - [ ] Compute exposure with the four-factor closed-form rule
- [ ] Record the
score.versionblock alongside every sized position - [ ] Subscribe to
score.tier_changedandrules.changedwebhooks - [ ] Backtest the sizing rule against your historical fills before going live
- [ ] Set a hard absolute dollar cap independent of the percentage cap
Sources + Further Reading
- SettleRisk methodology page — full scoring formula and driver registry
- SettleRisk pricing engine — dispute-adjusted fair prices and capital lockup math
- Wolfers & Zitzewitz, Prediction Markets (2004) — original calibration framework for event markets
- UMA Documentation — bond-and-challenge dispute economics on Polymarket
- CFTC final order on Kalshi event contracts (2024) — regulatory backdrop for Kalshi dispute resolution
Want to see this applied to your book? Spin up a free key at /signup and pull a score on any Polymarket or Kalshi market in under 200ms.
Get weekly risk analysis in your inbox
Market risk scores, emerging dispute patterns, and settlement delay trends — delivered every Monday.