Driver Attribution: How 15 Typed Risk Factors Combine into One Score

Executive Summary

A single number like "67 / 100, HIGH tier" is easy to display and trivial to ignore. The reason SettleRisk exposes the per-driver breakdown is that a HIGH score caused by AMBIGUOUS_WORDING is a totally different risk profile from a HIGH score caused by RETROACTIVE_RULE_CHANGE. The first is partially mitigable through tight stops; the second is not mitigable at all.

This post walks through the 15-driver taxonomy, the deterministic arithmetic that combines them, and the practical implications of each driver category for traders, market makers, and platforms.

Core Concept

SettleRisk's scoring engine is closed-form and deterministic. The arithmetic:

raw_points = BASE_POINTS(platform)
           + sum(driver.points_contribution for driver in drivers)
           - mitigation_points
           + complexity_points

aggregate_risk_score = clamp(round(raw_points), 0, 100)

BASE_POINTS(platform) is 12 for Polymarket and 8 for Kalshi, reflecting baseline dispute infrastructure quality. The driver list is capped at 15 entries sorted by points_contribution descending. Mitigation and complexity adjustments are bounded.

The 15 driver types and their default base contributions:

| Driver | Base points | Category | |--------|-------------|----------| | AMBIGUOUS_WORDING | 15 | Linguistic | | RETROACTIVE_RULE_CHANGE | 18 | Governance | | REGULATORY_RISK | 16 | External | | SUBJECTIVE_JUDGMENT | 14 | Linguistic | | COUNTERPARTY_RISK | 13 | Operational | | PRECEDENT_CONFLICT | 12 | Governance | | SINGLE_ORACLE_DEPENDENCY | 12 | Operational | | TEMPORAL_AMBIGUITY | 11 | Linguistic | | INFORMATION_ASYMMETRY | 11 | Operational | | METRIC_DEFINITION | 10 | Linguistic | | MULTI_STEP_RESOLUTION | 10 | Structural | | EXTERNAL_DEPENDENCY | 9 | Structural | | TIME_PRESSURE | 8 | Operational | | EDGE_CASE | 8 | Structural | | GEOGRAPHIC_AMBIGUITY | 7 | Linguistic |

Each driver's actual contribution is base_points * strength_multiplier * confidence, where strength is HIGH/MEDIUM/LOW and confidence is in [0, 1] from the LLM extraction layer.

Worked Example

A real Polymarket market resolution rule excerpt:

"This market will resolve YES if Bitcoin reaches approximately $100,000 by the end of day Friday, December 31, 2026. Resolution will use prices from major exchanges as determined by the platform."

SettleRisk extracts:

{
  "drivers": [
    {
      "driver_type": "AMBIGUOUS_WORDING",
      "strength": "HIGH",
      "confidence": 0.92,
      "points_contribution": 18,
      "evidence": {
        "text_span": "approximately $100,000",
        "start_char": 47,
        "end_char": 69
      }
    },
    {
      "driver_type": "TEMPORAL_AMBIGUITY",
      "strength": "MEDIUM",
      "confidence": 0.85,
      "points_contribution": 11,
      "evidence": {
        "text_span": "by the end of day Friday",
        "start_char": 74,
        "end_char": 98
      }
    },
    {
      "driver_type": "METRIC_DEFINITION",
      "strength": "MEDIUM",
      "confidence": 0.78,
      "points_contribution": 9,
      "evidence": {
        "text_span": "prices from major exchanges as determined by the platform",
        "start_char": 134,
        "end_char": 192
      }
    }
  ],
  "aggregate_risk_score": 67,
  "tier": "HIGH"
}

The arithmetic: 12 (Polymarket base) + 18 + 11 + 9 + small complexity adjustment = 67. Each driver points to the exact character span in the rules that triggered it. This is what makes the score actionable — a trader can read the rule themselves and confirm the call.

from settlerisk import SettleRiskClient

client = SettleRiskClient(api_key="sk-...")
score = client.get_risk_score("polymarket", "0xbtc100k")

for driver in score.drivers:
    print(f"{driver.driver_type:30s}  {driver.points_contribution:&gt;3}  '{driver.evidence.text_span}'")

import { SettleRiskClient } from "settlerisk";

const client = new SettleRiskClient({ apiKey: "sk-..." });
const score = await client.getRiskScore("polymarket", "0xbtc100k");

for (const d of score.drivers) {
  console.log(`${d.driverType.padEnd(30)}  ${d.pointsContribution.toString().padStart(3)}  '${d.evidence.textSpan}'`);
}

Implementation Notes

Drivers are sorted and capped at 15. The cap is on attribution, not extraction. The LLM may identify 20 issues; the API returns the top 15 by contribution. If you need the full extraction, hit /v1/evaluate-rules directly.

Tiebreakers are explicit. When two drivers tie on points_contribution, sort by (1) higher confidence, (2) higher strength, (3) alphabetical by driver_type. This ensures deterministic ordering across requests.

Evidence spans are inclusive-exclusive 0-based character offsets into the canonicalized rules text. To highlight in your UI, use:

def highlight(rules_text: str, start: int, end: int) -> str:
    return rules_text[:start] + f"[[{rules_text[start:end]}]]" + rules_text[end:]

Don't filter drivers by category in scoring. All 15 categories contribute to the final score. If you want to surface only Linguistic drivers in your UI, that is presentational — do not subtract them from the score.

| Use case | Recommended filter | |----------|--------------------| | Trader-facing UI | Top 5 drivers by contribution | | MM internal review | All drivers, sorted by contribution | | Compliance audit | All drivers with full evidence spans | | Platform integration | Top 3 + driver category counts |

Failure Modes

1. Confusing strength with confidence. Strength is the impact magnitude (LOW/MEDIUM/HIGH). Confidence is the LLM's certainty that it identified the right driver. A HIGH-strength, LOW-confidence driver is a different signal from a LOW-strength, HIGH-confidence one.

2. Re-running drivers on cached rules. If the rules text didn't change but the driver taxonomy did, you need to re-extract. Check score.version.driver_taxonomy_version and force a refresh when it bumps.

3. Treating evidence spans as byte offsets. They are character offsets in the canonicalized text. In languages with multi-byte UTF-8 (most non-ASCII content), byte offsets differ from character offsets. Use a char-boundary table.

4. Filtering out drivers based on category. All 15 categories aggregate into the score. Filtering for display is fine; filtering the math is not.

5. Hard-coding driver thresholds. Driver contributions can change between driver_taxonomy_version releases. Any threshold logic in your code (e.g. "alert if AMBIGUOUS_WORDING > 12") should reference the taxonomy version it was calibrated against.

Checklist

[ ] Read the full driver list, not just aggregate_risk_score
[ ] Display points_contribution and evidence.text_span in your UI
[ ] Persist driver_taxonomy_version with every score
[ ] Use char offsets (not byte offsets) when highlighting spans
[ ] Subscribe to score.tier_changed to catch driver-driven tier flips
[ ] Audit your top-3 drivers monthly against actual disputes

Sources + Further Reading

SettleRisk methodology — full driver taxonomy registry
Compare page — driver attribution differences vs. competitors
Risk Attribution in Quantitative Models (Litterman 1996) — theoretical grounding
Polymarket Dispute Tracker — empirical driver frequencies
Kalshi Resolution Rules Public Filings — driver examples in regulated markets

Pull driver attribution on any live market: /signup for a free key.