Reflective Evaluation
Requires the model to evaluate its own output before revising or committing to a final answer.

Problem
Default generation optimizes plausible output, not post-hoc checking, so constraint misses and factual errors pass through unchecked.
Solution
Insert an explicit self-evaluation phase (critique, verification, or confidence gate) before finalizing. Reapplying criteria in a second pass catches errors that generation mode often misses.
Example
One worked example. The book has 2–4 per technique.
You are reviewing a supplier contract. Identify all clauses that impose liability on the buyer. List each clause with its section number and a one-sentence summary. Do not include clauses that limit liability. After writing the list, review each entry: does it impose liability on the buyer (include) or limit liability (exclude)? Correct any misclassified entries before giving your final answer.
Draft list: - Section 4.2 — The buyer is liable for all shipping damages regardless of carrier. - Section 7.1 — Limits buyer's total liability to contract value. - Section 9.3 — Buyer assumes responsibility for customs duties and related fines. Review: - Section 4.2 — imposes liability. Include. ✓ - Section 7.1 — limits liability. Exclude per instructions. - Section 9.3 — imposes liability. Include. ✓ Final answer: - Section 4.2 — The buyer is liable for all shipping damages regardless of carrier. - Section 9.3 — Buyer assumes responsibility for customs duties and related fines.
Techniques
Concrete ways to implement Reflective Evaluation. Each technique fits a different situation.
- 01
Self-Critique
Generate an answer, critique it against specific named criteria, then revise based only on the issues raised.
- 02
Claim Verification
Extract each factual claim as an isolated question, answer it independently without referencing the draft, and revise where verification conflicts.
- 03
Confidence Gating
Produce a reliability signal (numeric rating, uncertainty breakdown, or binary gate) alongside the answer so downstream decisions can be made with better information.
