Playbooks & Templates

Prompt Review Checklist

This is a pre-send check for a single prompt, run once in the seconds before you hit enter. It is not the prompt evaluation checklist, which judges outputs across many runs against a rubric. The review asks one question: does this prompt give the assistant enough role, context, constraints, examples, and a testable definition of done to return something you can actually use on the first try?

PlaybookPrompting

When to use this

You are about to send a prompt that will produce code you intend to keep, like a paginated orders export on an internal service.
Your last three attempts came back close but wrong, and you are about to retype the request instead of fixing it.
You are writing a prompt someone else will reuse, so vague wording will cost more than your own time.
The task has a real constraint the assistant cannot guess, such as a database that must not be touched or a library version you are pinned to.
You want the output in a specific shape (a JSON object, a diff, a table) and the request does not yet say so.
You are pasting a long file or log into the prompt and the actual instruction is buried somewhere in the middle.
You are tempted to send a one-line request like "fix the export" and call it a prompt.

What it helps clarify

Whether the assistant knows who it is supposed to be and what success looks like, or is guessing at both.
Which facts are missing context the model cannot infer, versus background noise that just inflates the prompt.
Whether your constraints are stated as rules the model can follow, or left implied in your head.
Whether the output format is specified precisely enough that you could write a test for it.
Whether an example would settle the ambiguity faster than another paragraph of description.
Whether the prompt is ready to send, or one cheap edit away from saving you a wasted round trip.

Why a thirty-second review beats another retry

Maya needs cursor pagination on the orders export. She types "add pagination to the orders export, make it fast" and sends it. The assistant guesses a framework, invents a response shape, and rewrites a file she only wanted patched. She retypes the request with slightly different words. Same class of miss. Four round trips later she has spent more time steering than the change would have taken to write by hand. The prompt was missing the facts the model could not invent, and rephrasing the same gap a fourth time supplies none of them.

This is the gap the review closes. A language model fills every blank you leave with the most probable guess, and a probable guess is not your codebase. Anthropic's own prompting guidance frames the model as a brilliant but new employee who lacks context on your norms and workflows, and offers a test worth memorizing: show your prompt to a colleague with minimal context on the task and ask them to follow it. If they would be confused, the model will be too. That single rule catches most of what this checklist itemizes.

Skipping the review starts a loop, and the loop is the real cost. Each retry feels cheap, so you never stop to diagnose, and the loop quietly eats more time than reading your draft once would have. The review is a deliberate pause that turns "send and see" into "check and send". Thirty seconds against a ten-line list is almost always cheaper than a second wrong response, because a wrong response still has to be read, judged, and replaced. The checklist is the structured version of the colleague test: instead of hoping you remembered the file path and the constraint, you confirm each one before the model ever sees the request.

How to use each line well, with a good and a weak entry

Every line is a yes/no you answer by reading your own draft. The skill is telling a real yes from a yes you wish were true. Here is what strong looks like next to weak on the lines people fumble most.

Role and goal. Weak: "help me with the export." Strong: "You are working in our Node admin service; add cursor pagination to the orders export so a client can fetch 10k rows in pages without timing out." The strong version names the role and a checkable outcome in one breath.
Context, present and trimmed. Two failure modes pull in opposite directions. Weak by omission: no file path, no framework, no mention that the export already streams. Weak by flooding: the entire 800-line file pasted in when only the handler matters. Strong: the file path, the one helper to reuse, and the one rule that constrains the change. Anthropic suggests expressing context as a short set of decision-shaping facts and keeping raw material clearly separated, so the facts that steer the answer are not buried under the ones that do not.
Constraints as rules. Weak: "make it fast" or "keep it clean," which the model cannot act on. Strong: "do not change the response schema beyond adding nextCursor; reuse paginate() from lib/pagination.ts; keep the handler under 80 lines." A constraint the model can verify against is a constraint that survives contact with the output.
Output format named. Weak: leaving it open, then being annoyed it rewrote the file. Strong: "return a unified diff only, no prose," or "return JSON with keys orders and nextCursor." When the shape is easy to get wrong, add one short example. Anthropic's guidance is blunt here: one good example beats five adjectives, and few-shot examples (use three to five, each wrapped so it reads as a sample) are among the most reliable ways to pin format and tone.
Positive phrasing. Weak: "do not use markdown." Strong: "respond in plain prose paragraphs." Telling the model what to do steers it more reliably than telling it what to avoid, because the instruction points at a target instead of a wall.
Instruction placed last. When you paste a long file or log, put the actual request after it. Anthropic reports that moving the query to the end of a long-context prompt can lift response quality by up to 30% on complex, multi-document inputs. A request stranded in the middle competes with everything around it.
Testable success and handled uncertainty. Weak: "make it good." Strong: "stays under 200 ms for 10k rows," plus a line telling the model to stop and ask if the helper does not fit rather than inventing a new one. A definition of done you could write a test for is one you can actually check.

You do not need a perfect score. You need to find the one line that is a real no and fix that one. Most failed prompts miss a single thing, and the fix is usually a sentence, not a rewrite.

Pitfalls, and where the review hands off

A few traps make the checklist feel done while the prompt is still weak.

Ticking boxes you wish were true. The point of failure is reading "constraints explicit" and thinking "fast counts." It does not. If you cannot imagine the test, the line is still a no. Be your own skeptical colleague.
Confusing review with evaluation. This checklist judges one prompt before it runs. It says nothing about whether the prompt holds up across many inputs over time. A prompt that passes review can still drift or fail on the inputs you did not picture. That ongoing question belongs to the Prompt Evaluation Checklist, which scores outputs against a rubric over a fixed input set of roughly 50-200 cases, the small curated golden set teams run on every change to catch regressions. Use review to ship a good first prompt; use evaluation to trust a prompt you depend on.
Over-armoring the prompt. Older habits push people to shout: "CRITICAL: you MUST return JSON, NEVER add prose." Current models follow normal instructions well and can overtrigger on forceful, all-caps wording, doing more than asked or treating every clause as urgent. Dial it back to plain phrasing. The clarity does the work, not the volume.
Treating the review as the whole job. A clean prompt produces a response; it does not vouch for it. The review ends the moment the model replies. From there the output is just code, and code earns trust by being read.

On a team, the review pays off twice. A prompt one person reviews well becomes a prompt anyone can reuse, because the role, context, and format are on the page instead of in one person's head. Saved prompts, a project's CLAUDE.md, and shared snippets are where good reviewed prompts go to live, so the next person starts from a passing draft.

For your next real task, do not adopt all ten lines at once. Pick the line you skip most, probably "output format named" or "success is testable," and force a yes on it before your next send. When you are scoping coding work, start at the AI Coding Session Brief and Giving AI the Right Context so the prompt has facts to reference. When the assistant returns a diff, move to the AI Code Review Checklist and the Reviewing AI Code Safely guide. The review gets you a strong question; those handle whether you can trust the answer.

The checklist

Run top to bottom before you send. Each line is a yes/no you can answer by reading your own draft. A no is a thing to fix, not a thing to note.

Role and goal : The prompt states who the assistant should act as and what "done" looks like in one concrete sentence.
Context present : Every fact the model cannot infer is in the prompt, such as the stack, the file, the constraint, the data shape.
Context trimmed : Background that does not change the answer is cut, so the decision-shaping facts stand out.
Constraints explicit : Limits are written as rules, such as do not modify the schema, keep it under 150 lines, use the existing pagination helper.
Output format named : The exact shape is specified, such as return a unified diff, a JSON object with these keys, or a numbered list.
Example when format matters : One short worked example shows the shape you want, wrapped clearly so it reads as a sample, not an instruction.
Success is testable : The definition of done is something you could check or write a test for, not a mood like "make it good".
Positive phrasing : Instructions say what to do rather than only what to avoid, which steers the model more reliably.
Instruction placed last : When you paste a long file or log, the actual request sits at the end, below the pasted material.
Uncertainty handled : The prompt tells the model to ask or flag assumptions when a detail is missing, rather than invent one.

Example

Worked example: Maya reviews a prompt before sending it

Task: add cursor pagination to the orders export on the internal admin service. Draft prompt (before review): "Add pagination to the orders export. Make it fast." Review pass: - Role and goal: missing. No statement of done. - Context: missing the file, the framework, and that the export is already streamed. - Constraints: "fast" is a mood, not a rule. The real rule is unstated. - Output format: not named. Maya wants a diff, not a rewrite. - Example: not needed yet; the format rule will carry it. - Testable: "fast" cannot be checked. "Stays under 200 ms for 10k rows" can. Revised prompt (after review): "You are working in our Node admin service. The file is routes/orders-export.ts; it currently streams all orders in one response. Add cursor-based pagination using the existing paginate() helper in lib/pagination.ts. Do not change the response schema beyond adding a nextCursor field. Return a unified diff only, no prose. Keep the handler under 80 lines. If the helper does not fit, stop and tell me why instead of writing a new one." Result: one round trip instead of four.

Usage notes

Use this before you send. To judge a prompt that is already in use, switch to the Prompt Evaluation Checklist, which scores outputs across a fixed set of inputs over many runs.
Most failed prompts miss exactly one line on this list. Find that line and fix it rather than rewriting the whole request.
Keep prompts free of aggressive all-caps demands like "CRITICAL: you MUST". Current models follow normal instructions well and can overreact to forceful wording, so plain phrasing is enough.
For coding prompts, pair this with the AI Coding Session Brief to scope the task first, then with Giving AI the Right Context to assemble the files and constraints the prompt references.
Once the assistant returns code, the review is over and a different bar applies. Run the AI Code Review Checklist on the diff before you keep it.
Revisit your own habits monthly. The line you skip most often is the one worth turning into a saved snippet or a CLAUDE.md note.

Copy this into your notes

# Prompt Review Checklist Prompt being reviewed: > (paste your draft prompt here) ## Pre-send checks - [ ] Role and goal stated; "done" is one concrete sentence - [ ] Context present: stack, file, constraint, data shape - [ ] Context trimmed: no background that does not change the answer - [ ] Constraints written as rules, not implied - [ ] Output format named (diff / JSON keys / table / list) - [ ] Example included when the format is easy to get wrong - [ ] Success is testable, not a mood - [ ] Phrased as what to do, not only what to avoid - [ ] Instruction placed after any long pasted file or log - [ ] Model told to ask or flag assumptions when a detail is missing ## Fix log - The one line that was a no: - What I changed:

Downloadable version

A one-page pre-send check for a single prompt, before it ever runs.

Preview

Role and intent

The prompt names the role the assistant should take, even if it is one sentence.
The goal is stated as a concrete definition of done, not a vague ask.
The single most important outcome is obvious from the first two lines.

Context

Every fact the model cannot infer is included: stack, file path, framework, data shape.
The relevant constraint from the wider system is present, such as a schema that must stay stable.
Background that does not change the answer has been cut so the key facts stand out.
When a long file or log is pasted, the actual instruction sits at the end, below it.

Constraints and format

Limits are written as explicit rules, not left implied in your head.
The exact output shape is named: a diff, a JSON object with set keys, a table, a numbered list.
One short worked example is included when the format is easy to get wrong.
Instructions say what to do rather than only what to avoid.

Definition of done

Success is something you could check or write a test for, not a mood like "make it good".
The prompt tells the model to ask or flag assumptions when a detail is missing.
You would be willing to bet the first response is usable; if not, find the one weak line and fix it.

Download PDF