Three variants scored against the same eval set. Composite is 0.3·category + 0.3·appealability + 0.2·action + 0.2·artifact. Refused→appeal requires both the model’s refusal and a judge action ≥ 0.7 — the “right for the right reason” gate.
| Variant | Composite | Category | Appealable | Refused→Appeal | Action | Artifact | Latency | Cache | Run |
|---|---|---|---|---|---|---|---|---|---|
Zero-shotzero-shot | — | — | — | — | — | — | — | — | |
Few-shotfew-shot | — | — | — | — | — | — | — | — | |
Structured-firststructured-extraction-first | — | — | — | — | — | — | — | — |