Subjective Outcomes: When Markets Resolve by Committee

Executive Summary

When a prediction-market rule requires a human reviewer to decide an outcome — because the threshold is qualitative, the criteria are contested, or the source-of-truth is plural — you are no longer trading the underlying event. You are trading the committee. Markets with the SUBJECTIVE_JUDGMENT driver have the highest mean loss per dispute in the SettleRisk dataset. This post explains why, and what to do about it.

Core Concept

SUBJECTIVE_JUDGMENT fires when SettleRisk's LLM extractor identifies one of these patterns:

| Pattern | Example | |---------|---------| | Qualitative threshold | "significantly", "materially", "substantially" | | Reviewer authority | "as determined by the platform" | | Vague entity criteria | "major exchange", "leading source" | | Non-numeric impact | "the policy was effective" | | Multi-criteria with weights unspecified | "primarily, but also considering..." |

The driver carries 14 base points and a max_points of 26 (the second-highest cap in the taxonomy, after RETROACTIVE_RULE_CHANGE). The reason is empirical: subjective-judgment disputes have a median resolution time of 23 days and a 31% probability of reversal on appeal, both significantly higher than any other driver.

Worked Example

A real Polymarket on "Will the new tariff package significantly impact bilateral trade by Q2 2026?" — the word "significantly" alone created a multi-week dispute when trade volume fell 12%. Some traders argued any double-digit drop qualified; others pointed to historical precedent where 20%+ was the threshold.

from settlerisk import SettleRiskClient

client = SettleRiskClient(api_key="sk-...")
score = client.get_risk_score("polymarket", "tariffs-significant-impact-2026")

# Score: 78, tier: CRITICAL
# p_dispute: 0.312
# Top drivers:
#   SUBJECTIVE_JUDGMENT       21  conf=0.94
#   AMBIGUOUS_WORDING         15  conf=0.88
#   METRIC_DEFINITION          9  conf=0.79

The committee that eventually resolved the market set the threshold retroactively at 20%, locking out traders who had taken the YES side believing 10-15% would qualify. $3.8M sat frozen for 23 days.

Detection in code:

SUBJECTIVE_LEXICON = {
    "significantly", "substantially", "materially", "meaningfully",
    "effectively", "primarily", "appropriately", "appreciably"
}

def has_subjective_terms(text: str) -> list[str]:
    text_lower = text.lower()
    return [w for w in SUBJECTIVE_LEXICON if w in text_lower]

const SUBJECTIVE_LEXICON = new Set([
  "significantly", "substantially", "materially", "meaningfully",
  "effectively", "primarily", "appropriately", "appreciably",
]);

function hasSubjectiveTerms(text: string): string[] {
  const lc = text.toLowerCase();
  return [...SUBJECTIVE_LEXICON].filter((w) => lc.includes(w));
}

This catches the top patterns but misses entity-level subjectivity (e.g. "leading source"). For those, run the full extraction via /v1/evaluate-rules.

Implementation Notes

Trade these markets with extreme size discipline. A SUBJECTIVE_JUDGMENT driver above 18 points should reduce exposure caps to ~25% of base for that market. The realized loss distribution is heavy enough that even small position cumulative drawdown can wreck a quarter.

Watch for retroactive threshold-setting. When a market enters dispute on subjective grounds, the committee will pick a threshold. Subsequent markets with similar language will be governed by that precedent — but until then, you have no anchor. Persist resolved interpretations and reuse them.

Skip the worst sub-patterns entirely. "As determined by the platform" with no enumerated criteria is a quote-or-skip filter. The math doesn't work; the dispute economics don't work; just don't trade it.

| Sub-pattern | Recommended action | |-------------|--------------------| | Qualitative threshold | Size at 50% of base | | Reviewer authority (enumerated criteria) | Size at 70% of base | | Reviewer authority (no criteria) | Do not quote | | Vague entity | Size at 60% of base, require explicit fallback rule | | Multi-criteria unspecified | Size at 40% of base |

The pricing engine flags these automatically. If score.drivers includes a SUBJECTIVE_JUDGMENT driver with points_contribution > 18, the pricing engine widens fair spread by 50-100 bps relative to what the score alone would suggest.

Failure Modes

1. Trading the underlying instead of the committee. A market with subjective resolution is a meta-market on how reviewers will interpret. Fundamental analysis of the underlying event undervalues this.

2. Ignoring precedent. Committees follow precedent. If a similar market resolved with a 20% threshold last quarter, this one will too — unless the rules explicitly differ. Keep a precedent log.

3. Sizing to base on the first appearance. A SUBJECTIVE_JUDGMENT > 18 market should be size-capped well below your base allocation until the precedent is set.

4. Confusing subjective with qualitative. Some qualitative terms have well-established quantitative interpretations (e.g. "investment grade" maps to specific rating thresholds). Those are not subjective in the SettleRisk sense.

5. Skipping evidence spans. When a market scores high on SUBJECTIVE_JUDGMENT, the specific phrase in the rules text is the highest-signal info on the page. Always show it to the desk.

Checklist

[ ] Maintain a lexicon-based pre-filter for subjective terms
[ ] Use the full extraction for entity-level subjectivity
[ ] Reduce exposure caps for subjective drivers > 18 pts
[ ] Quote-skip "as determined by the platform" with no criteria
[ ] Persist resolved precedents for reuse
[ ] Subscribe to dispute.resolved to capture new precedents

Sources + Further Reading

SettleRisk methodology — full SUBJECTIVE_JUDGMENT pattern set
Ambiguous wording post — related linguistic patterns
Driver attribution post — how drivers combine
Vagueness in Legal Drafting (Endicott 2000) — adjacent academic literature
Polymarket UMA dispute appeals — empirical reversal rates

Free key at /signup — extract drivers on any rules text in 200ms.