Dispute Probability Calibration: Sizing Positions Around Resolution Risk

Executive Summary

SettleRisk returns a scalar p_dispute for every market — the modeled probability that resolution will be contested through a formal dispute process. That number is easy to display in a UI and hard to act on. A 0.18 dispute probability on a market trading at 70 cents does not, by itself, tell a market maker how much capital to commit. This post walks through a closed-form sizing framework that takes p_dispute, the tier, and the expected delay distribution and produces a single number: maximum exposure as a percentage of book.

The model is intentionally conservative. It deliberately leaves easy money on the table on LOW-tier markets to avoid blowing up on CRITICAL ones, because the historical loss asymmetry favors that trade.

Core Concept

p_dispute is a calibrated tail-probability, not a directional signal. It tells you how often markets that look like this one end up in a dispute window — it tells you nothing about which side wins or loses. The right way to use it is as a haircut on Kelly-style sizing, not as a directional prior.

The closed-form formula:

max_exposure_pct = base_pct * (1 - p_dispute) * delay_haircut(tier) * platform_factor

Where:

base_pct is your unhaircut maximum book share for this strategy (e.g. 2.0% for a tight-quoting MM book)
p_dispute is the SettleRisk output, clamped to [0.01, 0.90]
delay_haircut(tier) shrinks exposure when expected settlement delays widen — see the table below
platform_factor adjusts for venue-specific recovery rates (Polymarket=0.85, Kalshi=1.00)

The intuition: (1 - p_dispute) is the probability of a clean settle. delay_haircut prices the carrying cost of capital you cannot get out. platform_factor reflects the empirical observation that disputed Polymarket resolutions cost more in opportunity terms than disputed Kalshi ones, even before accounting for capital lockup.

Worked Example

A Polymarket market on "Will US-China tariffs exceed 25% by June 30, 2026?" is quoted at 68/72 cents. SettleRisk returns:

| Field | Value | |-------|-------| | aggregate_risk_score | 67 | | tier | HIGH | | p_dispute | 0.247 | | delay_p50_hours | 96 | | delay_p90_hours | 312 |

A 2% base exposure cap with the closed-form rule:

from settlerisk import SettleRiskClient

client = SettleRiskClient(api_key="sk-...")
score = client.get_risk_score("polymarket", "0xtariffmarket")

DELAY_HAIRCUTS = {"LOW": 1.00, "MEDIUM": 0.80, "HIGH": 0.55, "CRITICAL": 0.25}
PLATFORM_FACTOR = {"polymarket": 0.85, "kalshi": 1.00}

base_pct = 0.02
exposure = (
    base_pct
    * (1 - score.p_dispute)
    * DELAY_HAIRCUTS[score.tier]
    * PLATFORM_FACTOR[score.platform]
)

print(f"Max exposure: {exposure:.4%} of book")
# Max exposure: 0.7044% of book

The same calculation in TypeScript:

import { SettleRiskClient } from "settlerisk";

const client = new SettleRiskClient({ apiKey: "sk-..." });
const score = await client.getRiskScore("polymarket", "0xtariffmarket");

const DELAY_HAIRCUTS = { LOW: 1.0, MEDIUM: 0.8, HIGH: 0.55, CRITICAL: 0.25 };
const PLATFORM_FACTOR = { polymarket: 0.85, kalshi: 1.0 };

const exposure =
  0.02 *
  (1 - score.p_dispute) *
  DELAY_HAIRCUTS[score.tier] *
  PLATFORM_FACTOR[score.platform];

console.log(`Max exposure: ${(exposure * 100).toFixed(4)}% of book`);
// Max exposure: 0.7044% of book

On a $10M book, that translates to a hard cap of about $70,440 of net exposure on this single market. A market maker quoting two-sided might still cycle 5-10x that in gross volume per day, but they should not be net-long or net-short that market beyond the cap.

Implementation Notes

The haircut table looks aggressive and that is on purpose. The empirical loss distribution across disputed markets has a long right tail dominated by a handful of CRITICAL-tier blowups. The 0.25 multiplier on CRITICAL is a deliberate choice to make sure those blowups can never put more than ~15% of the base cap at risk before the position is unwound.

Cache the score, not the exposure. p_dispute updates whenever rules change or new disputes get logged on a similar market, and you want your sizing to recompute on every refresh of the score snapshot. Use the score.version block to detect when a refresh has occurred:

| Field | Cache strategy | |-------|----------------| | score.aggregate_risk_score | Cache 15min | | score.p_dispute | Cache 15min | | score.expected_delay | Cache 1h | | score.version | Always check | | exposure_pct (computed) | Never cache — recompute every quote |

For high-frequency strategies, subscribe to the score.tier_changed webhook so a tier change triggers immediate re-sizing rather than waiting for a 15-minute cache TTL.

See the full driver schema in our methodology docs, and the underlying scoring formula for the components that combine into aggregate_risk_score.

Failure Modes

1. Treating p_dispute as a directional signal. A high p_dispute does not mean "fade the market" or "back the YES side." It is a tail probability for resolution failure. Operators who use it as a directional input invariably overfit to a handful of historical cases.

2. Compounding haircuts incorrectly. The formula above multiplies four factors. Some teams add them or take the worst of each, which produces sizes that are either too aggressive or unrelated to the underlying math.

3. Ignoring the version stamps. The same market scored under heuristics_version=1.0.0 and heuristics_version=1.1.0 can produce a meaningfully different p_dispute. If your sizing logic does not record which version produced the score, you cannot reconstruct why a position was sized the way it was during a post-mortem.

4. Stale tier on rules changes. When market rules get edited mid-life — common for political markets near elections — the entire score block invalidates. Subscribe to rules.changed and force a refresh; do not lean on TTL alone.

5. Skipping the platform_factor. Polymarket disputes resolve through UMA's optimistic oracle with bond-and-challenge economics. Kalshi disputes resolve through CFTC-supervised internal processes. The recovery and opportunity cost profiles are different enough that one shared factor flattens out a real edge.

Checklist

[ ] Pull p_dispute, tier, and platform from a fresh score snapshot
[ ] Compute exposure with the four-factor closed-form rule
[ ] Record the score.version block alongside every sized position
[ ] Subscribe to score.tier_changed and rules.changed webhooks
[ ] Backtest the sizing rule against your historical fills before going live
[ ] Set a hard absolute dollar cap independent of the percentage cap

Sources + Further Reading

SettleRisk methodology page — full scoring formula and driver registry
SettleRisk pricing engine — dispute-adjusted fair prices and capital lockup math
Wolfers & Zitzewitz, Prediction Markets (2004) — original calibration framework for event markets
UMA Documentation — bond-and-challenge dispute economics on Polymarket
CFTC final order on Kalshi event contracts (2024) — regulatory backdrop for Kalshi dispute resolution

Want to see this applied to your book? Spin up a free key at /signup and pull a score on any Polymarket or Kalshi market in under 200ms.