Reflective Evaluation

Requires the model to evaluate its own output before revising or committing to a final answer.

Also known asSelf-EvaluationSelf-CritiqueSelf-AssessmentSelf-Verification

Problem

Default generation optimizes plausible output, not post-hoc checking, so constraint misses and factual errors pass through unchecked.

Solution

Insert an explicit self-evaluation phase (critique, verification, or confidence gate) before finalizing. Reapplying criteria in a second pass catches errors that generation mode often misses.

Example

One worked example. The book has 2–4 per technique.

Prompt

You are reviewing a supplier contract. Identify all clauses that impose liability on the buyer. List each clause with its section number and a one-sentence summary. Do not include clauses that limit liability.

After writing the list, review each entry: does it impose liability on the buyer (include) or limit liability (exclude)? Correct any misclassified entries before giving your final answer.

Response

Draft list:
- Section 4.2 — The buyer is liable for all shipping damages regardless of carrier.
- Section 7.1 — Limits buyer's total liability to contract value.
- Section 9.3 — Buyer assumes responsibility for customs duties and related fines.

Review:
- Section 4.2 — imposes liability. Include. ✓
- Section 7.1 — limits liability. Exclude per instructions.
- Section 9.3 — imposes liability. Include. ✓

Final answer:
- Section 4.2 — The buyer is liable for all shipping damages regardless of carrier.
- Section 9.3 — Buyer assumes responsibility for customs duties and related fines.

Techniques

Concrete ways to implement Reflective Evaluation. Each technique fits a different situation.

01
Self-Critique
Generate an answer, critique it against specific named criteria, then revise based only on the issues raised.
02
Claim Verification
Extract each factual claim as an isolated question, answer it independently without referencing the draft, and revise where verification conflicts.
03
Confidence Gating
Produce a reliability signal (numeric rating, uncertainty breakdown, or binary gate) alongside the answer so downstream decisions can be made with better information.

Full treatment in the book