PLV by ThoughtProof: structured plan-level verification for agent decisions. Cascade pipeline, optional EAS attestation, two tiers.
Hosted at verify.thoughtproof.ai. Provider-neutral. Open-source engine.
LLMs produce confident, well-structured nonsense. Without cross-model verification, hallucinated facts look identical to correct ones.
A single model can't distinguish adversarial inputs from legitimate ones — because it shares the same vulnerability profile with itself.
EU AI Act (Art. 9, 13, 14) requires documented human oversight and risk mitigation. A single model output produces no audit trail.
Self-monitoring produces correlated blindspots. On SHADE-Arena, same-family monitors score near random chance (AUC ~0.50). Cross-family monitors detect immediately.
A structured plan-level verification process for AI-agent traces. It maps reasoning to explicit plan steps, returns a public ALLOW/BLOCK/UNCERTAIN verdict, and can issue Base EAS attestations (opt-in) for on-chain audit trails.
Standardize input across providers and formats
→Convert trace + answer into explicit plan-step evidence
→Evaluate each step for support, faithfulness, and hard-fail risk
→Escalate ambiguous cases from fast models to Sonnet rescue
→Return ALLOW/BLOCK/UNCERTAIN plus optional EAS attestation
| Metric | Value | Benchmark | Layer |
|---|---|---|---|
| PLV Accuracy | 98.1% | Standard tier · allow recall · 0 false ALLOWs | Plan-Level |
| BLOCK→ALLOW Safety Regressions | 0 | 120-case validation · Hard Rule clean | Safety |
| Default Tier | SERV Nano→Gemini | standard · $0.04 | API v2 |
| Thorough Tier | 88.6% accuracy | SERV Nano→Gemini→Sonnet · 120 cases · 0 B→A | Production |
| Cost Efficiency | 107× perf/$ | vs. previous cascade · same 120 cases | Economics |
| Sabotage Detection AUC | 0.720 | SHADE-Arena · 208 runs | Resistance |
| Faithfulness Detection | 88.9% | RFEval · external benchmark | Faithfulness |