Driver Attribution: How 15 Typed Risk Factors Combine into One Score
Executive Summary
A single number like "67 / 100, HIGH tier" is easy to display and trivial to ignore. The reason SettleRisk exposes the per-driver breakdown is that a HIGH score caused by AMBIGUOUS_WORDING is a totally different risk profile from a HIGH score caused by RETROACTIVE_RULE_CHANGE. The first is partially mitigable through tight stops; the second is not mitigable at all.
This post walks through the 15-driver taxonomy, the deterministic arithmetic that combines them, and the practical implications of each driver category for traders, market makers, and platforms.
Core Concept
SettleRisk's scoring engine is closed-form and deterministic. The arithmetic:
raw_points = BASE_POINTS(platform)
+ sum(driver.points_contribution for driver in drivers)
- mitigation_points
+ complexity_points
aggregate_risk_score = clamp(round(raw_points), 0, 100)
BASE_POINTS(platform) is 12 for Polymarket and 8 for Kalshi, reflecting baseline dispute infrastructure quality. The driver list is capped at 15 entries sorted by points_contribution descending. Mitigation and complexity adjustments are bounded.
The 15 driver types and their default base contributions:
| Driver | Base points | Category | |--------|-------------|----------| | AMBIGUOUS_WORDING | 15 | Linguistic | | RETROACTIVE_RULE_CHANGE | 18 | Governance | | REGULATORY_RISK | 16 | External | | SUBJECTIVE_JUDGMENT | 14 | Linguistic | | COUNTERPARTY_RISK | 13 | Operational | | PRECEDENT_CONFLICT | 12 | Governance | | SINGLE_ORACLE_DEPENDENCY | 12 | Operational | | TEMPORAL_AMBIGUITY | 11 | Linguistic | | INFORMATION_ASYMMETRY | 11 | Operational | | METRIC_DEFINITION | 10 | Linguistic | | MULTI_STEP_RESOLUTION | 10 | Structural | | EXTERNAL_DEPENDENCY | 9 | Structural | | TIME_PRESSURE | 8 | Operational | | EDGE_CASE | 8 | Structural | | GEOGRAPHIC_AMBIGUITY | 7 | Linguistic |
Each driver's actual contribution is base_points * strength_multiplier * confidence, where strength is HIGH/MEDIUM/LOW and confidence is in [0, 1] from the LLM extraction layer.
Worked Example
A real Polymarket market resolution rule excerpt:
"This market will resolve YES if Bitcoin reaches approximately $100,000 by the end of day Friday, December 31, 2026. Resolution will use prices from major exchanges as determined by the platform."
SettleRisk extracts:
{
"drivers": [
{
"driver_type": "AMBIGUOUS_WORDING",
"strength": "HIGH",
"confidence": 0.92,
"points_contribution": 18,
"evidence": {
"text_span": "approximately $100,000",
"start_char": 47,
"end_char": 69
}
},
{
"driver_type": "TEMPORAL_AMBIGUITY",
"strength": "MEDIUM",
"confidence": 0.85,
"points_contribution": 11,
"evidence": {
"text_span": "by the end of day Friday",
"start_char": 74,
"end_char": 98
}
},
{
"driver_type": "METRIC_DEFINITION",
"strength": "MEDIUM",
"confidence": 0.78,
"points_contribution": 9,
"evidence": {
"text_span": "prices from major exchanges as determined by the platform",
"start_char": 134,
"end_char": 192
}
}
],
"aggregate_risk_score": 67,
"tier": "HIGH"
}
The arithmetic: 12 (Polymarket base) + 18 + 11 + 9 + small complexity adjustment = 67. Each driver points to the exact character span in the rules that triggered it. This is what makes the score actionable — a trader can read the rule themselves and confirm the call.
from settlerisk import SettleRiskClient
client = SettleRiskClient(api_key="sk-...")
score = client.get_risk_score("polymarket", "0xbtc100k")
for driver in score.drivers:
print(f"{driver.driver_type:30s} {driver.points_contribution:>3} '{driver.evidence.text_span}'")
import { SettleRiskClient } from "settlerisk";
const client = new SettleRiskClient({ apiKey: "sk-..." });
const score = await client.getRiskScore("polymarket", "0xbtc100k");
for (const d of score.drivers) {
console.log(`${d.driverType.padEnd(30)} ${d.pointsContribution.toString().padStart(3)} '${d.evidence.textSpan}'`);
}
Implementation Notes
Drivers are sorted and capped at 15. The cap is on attribution, not extraction. The LLM may identify 20 issues; the API returns the top 15 by contribution. If you need the full extraction, hit /v1/evaluate-rules directly.
Tiebreakers are explicit. When two drivers tie on points_contribution, sort by (1) higher confidence, (2) higher strength, (3) alphabetical by driver_type. This ensures deterministic ordering across requests.
Evidence spans are inclusive-exclusive 0-based character offsets into the canonicalized rules text. To highlight in your UI, use:
def highlight(rules_text: str, start: int, end: int) -> str:
return rules_text[:start] + f"[[{rules_text[start:end]}]]" + rules_text[end:]
Don't filter drivers by category in scoring. All 15 categories contribute to the final score. If you want to surface only Linguistic drivers in your UI, that is presentational — do not subtract them from the score.
| Use case | Recommended filter | |----------|--------------------| | Trader-facing UI | Top 5 drivers by contribution | | MM internal review | All drivers, sorted by contribution | | Compliance audit | All drivers with full evidence spans | | Platform integration | Top 3 + driver category counts |
Failure Modes
1. Confusing strength with confidence. Strength is the impact magnitude (LOW/MEDIUM/HIGH). Confidence is the LLM's certainty that it identified the right driver. A HIGH-strength, LOW-confidence driver is a different signal from a LOW-strength, HIGH-confidence one.
2. Re-running drivers on cached rules. If the rules text didn't change but the driver taxonomy did, you need to re-extract. Check score.version.driver_taxonomy_version and force a refresh when it bumps.
3. Treating evidence spans as byte offsets. They are character offsets in the canonicalized text. In languages with multi-byte UTF-8 (most non-ASCII content), byte offsets differ from character offsets. Use a char-boundary table.
4. Filtering out drivers based on category. All 15 categories aggregate into the score. Filtering for display is fine; filtering the math is not.
5. Hard-coding driver thresholds. Driver contributions can change between driver_taxonomy_version releases. Any threshold logic in your code (e.g. "alert if AMBIGUOUS_WORDING > 12") should reference the taxonomy version it was calibrated against.
Checklist
- [ ] Read the full driver list, not just
aggregate_risk_score - [ ] Display
points_contributionandevidence.text_spanin your UI - [ ] Persist
driver_taxonomy_versionwith every score - [ ] Use char offsets (not byte offsets) when highlighting spans
- [ ] Subscribe to
score.tier_changedto catch driver-driven tier flips - [ ] Audit your top-3 drivers monthly against actual disputes
Sources + Further Reading
- SettleRisk methodology — full driver taxonomy registry
- Compare page — driver attribution differences vs. competitors
- Risk Attribution in Quantitative Models (Litterman 1996) — theoretical grounding
- Polymarket Dispute Tracker — empirical driver frequencies
- Kalshi Resolution Rules Public Filings — driver examples in regulated markets
Pull driver attribution on any live market: /signup for a free key.
Get weekly risk analysis in your inbox
Market risk scores, emerging dispute patterns, and settlement delay trends — delivered every Monday.