Ambiguous Wording Detection: Linguistic Patterns That Predict Disputes

Executive Summary

Resolution disputes look like one-off events. They aren't. Audit a sample of disputed Polymarket and Kalshi markets and the same seven linguistic patterns show up across most of them: hedge words, undefined comparators, missing timezones, qualitative thresholds, undefined entities, conditional escape clauses, and source-of-truth ambiguity. This post catalogues each pattern, shows the exact SettleRisk detection logic, and explains how to read a contract for these patterns before you trade it.

Core Concept

The AMBIGUOUS_WORDING driver in SettleRisk's taxonomy is decomposed by the LLM extractor into seven sub-patterns:

| Pattern | Example | Sub-points | |---------|---------|-----------| | Hedge word | "approximately", "around", "roughly" | 6 | | Undefined comparator | "significantly", "substantially" | 7 | | Missing timezone | "end of day Friday" with no tz | 5 | | Qualitative threshold | "meaningful", "material" | 6 | | Undefined entity | "the official source" with no name | 5 | | Conditional escape | "unless [vague condition]" | 8 | | Source ambiguity | "as determined by [multiple sources]" | 6 |

Each detected sub-pattern adds its sub-points to the AMBIGUOUS_WORDING driver's points_contribution, capped at the driver's max_points (22 in v1.0).

Worked Example

Consider this real Polymarket contract text:

"Resolves YES if the Federal Reserve announces a meaningful policy shift on quantitative tightening by Q2 2026, as determined by official Fed communications."

Three patterns fire:

Qualitative threshold: "meaningful policy shift" (no quantitative bound)
Temporal ambiguity: "by Q2 2026" (which day exactly — last day of the quarter? Last FOMC meeting?)
Source ambiguity: "official Fed communications" (FOMC statement? Press conference? Speech by Powell? Dot plot?)

from settlerisk import SettleRiskClient

client = SettleRiskClient(api_key="sk-...")
extraction = client.evaluate_rules(
    platform="polymarket",
    rules_text="Resolves YES if the Federal Reserve announces a meaningful "
               "policy shift on quantitative tightening by Q2 2026, as "
               "determined by official Fed communications.",
)

for driver in extraction.drivers:
    if driver.driver_type == "AMBIGUOUS_WORDING":
        for pattern in driver.sub_patterns:
            print(f"  - {pattern.type}: '{pattern.text_span}' ({pattern.sub_points} pts)")

  - qualitative_threshold: 'meaningful policy shift' (6 pts)
  - source_ambiguity: 'official Fed communications' (6 pts)

Temporal ambiguity gets logged under TEMPORAL_AMBIGUITY rather than under AMBIGUOUS_WORDING, because the taxonomy treats temporal issues as a distinct category for clearer attribution.

In TypeScript:

import { SettleRiskClient } from "settlerisk";

const client = new SettleRiskClient({ apiKey: "sk-..." });
const extraction = await client.evaluateRules({
  platform: "polymarket",
  rulesText:
    "Resolves YES if the Federal Reserve announces a meaningful policy shift " +
    "on quantitative tightening by Q2 2026, as determined by official Fed communications.",
});

for (const driver of extraction.drivers) {
  if (driver.driverType === "AMBIGUOUS_WORDING") {
    for (const p of driver.subPatterns ?? []) {
      console.log(`  - ${p.type}: '${p.textSpan}' (${p.subPoints} pts)`);
    }
  }
}

Implementation Notes

Hedge words are the single most common pattern. "Approximately" alone shows up in ~14% of disputed markets. If you see it in a rule, the market is automatically MEDIUM-tier or higher.

Source ambiguity is the highest-loss pattern. Markets where the source-of-truth is plural ("official communications", "major exchanges") account for the largest median dollar dispute. This is because the resolver has to pick one and the side that didn't get picked can credibly argue procedure.

Conditional escape clauses are silent killers. Phrases like "unless extraordinary circumstances" or "subject to platform discretion" give the venue unilateral resolution power. They look fine in isolation; they devastate predictability.

| Pattern | Detection difficulty | Dispute frequency | |---------|---------------------|-------------------| | Hedge word | Easy (lexicon) | High | | Source ambiguity | Medium (NER required) | High | | Conditional escape | Hard (clause-level parse) | Medium | | Qualitative threshold | Medium (qualitative adjective list) | Medium | | Missing timezone | Easy (datetime parse) | Medium | | Undefined entity | Hard (entity resolution) | Low | | Undefined comparator | Easy (lexicon) | Low |

For your own use, you can pre-screen rules text with a hedge-word lexicon and skip markets with any hits unless explicitly needed:

HEDGE_WORDS = {"approximately", "around", "roughly", "near", "circa"}

def has_hedge_words(text: str) -> bool:
    return any(w in text.lower() for w in HEDGE_WORDS)

That alone catches ~30% of the markets SettleRisk flags as HIGH or CRITICAL on the AMBIGUOUS_WORDING driver.

Failure Modes

1. Treating the lexicon as exhaustive. New hedge constructions appear constantly. Don't hard-code a finite list as a replacement for the LLM extraction step.

2. Confusing detection with severity. A market can have an "approximately" in the rules and still be LOW-tier if the underlying metric is precise (e.g. "approximately $100K" referring to a CFTC-published statistic). Use the full driver block.

3. Skipping the evidence spans. A reviewer who sees "AMBIGUOUS_WORDING: 18 pts" with no span has no audit trail. Always render the span.

4. Ignoring patterns in non-rule text. Some venues include resolution methodology in linked docs rather than the market description. SettleRisk follows URLs in rules text up to two hops — but only if the URLs are present. If methodology is referenced by name only, the pattern won't fire on that document.

5. Re-using extractions across rule versions. When a market's rules text gets edited, the extraction is invalidated. Subscribe to rules.changed and re-pull.

Checklist

[ ] Read evidence spans for every AMBIGUOUS_WORDING driver
[ ] Maintain your own hedge-word lexicon as a fast pre-filter
[ ] Flag source ambiguity separately — these have the worst loss distribution
[ ] Watch for conditional escape clauses on platform-governed markets
[ ] Re-pull score on rules.changed events
[ ] Audit the top 10 hedge words in your traded contracts monthly

Sources + Further Reading

SettleRisk methodology — full sub-pattern taxonomy
Driver attribution post — how AMBIGUOUS_WORDING combines with other drivers
Computational Detection of Vagueness in Text (Marneffe et al. 2012)
Polymarket UMA dispute history — pattern frequencies
Kalshi disputed market filings — regulatory record

Try it on a real rule: paste rules text into our /demo — no key required.