Guide · AI in the Developer Workflow

Using Claude for Development

Claude helps across the whole task, planning a change, reasoning through a bug, refactoring, drafting docs, and reading a diff, not only the typing. It predicts plausible output from what is in its context, so your control comes from three levers: the context you supply, the plan you agree before it writes, and the diff you review before you keep it. You make every decision, including the merge.

ClaudePlanningReview

~10 min

When to use this

You want to think through a change, a bug, or a design choice before you write any code.
You want a second read on a diff, an error, or a tradeoff you are unsure about.
The same project conventions and standards apply to every chat, and re-pasting them each time wastes your effort.
You are working in a real codebase where a plausible-but-wrong change would cost you, so you want a plan and a review gate.

Key ideas

Partner, not autopilot: Claude explores, plans, drafts, and critiques, and you keep every decision. The official Anthropic workflow is explore, plan, implement, commit, with you in the loop at each step. You describe the outcome and Claude proposes how to build it, but the diff is a draft until you have read it.
Plan before any code: Ask for the approach, the files it would touch, the risks, and the edge cases before implementation. A plan is cheap to revise; a wrong diff is expensive to unwind. Planning earns its overhead when the approach is uncertain, the change spans several files, or the code is unfamiliar. If you could describe the diff in one sentence, skip the plan.
Context decides quality: Claude reasons over the files, errors, types, and constraints you give it, and guesses when you withhold them. On current models a Project on Claude.ai holds your standing context, the conventions and reference docs, so every conversation inside it starts from the same ground without re-pasting.
A fresh reviewer beats the author: A model asked to review code it just wrote is biased toward defending it. Open a fresh conversation, paste only the diff and the criteria, and ask what assumptions the code is making. This Writer then Reviewer split is the same reason a second pair of human eyes catches what the author missed.
You own the merge: A December 2025 CodeRabbit study of 470 pull requests found AI-coauthored changes carry about 1.7x more issues than human-written ones, with logic and correctness defects up 75%. Plausible is the dangerous kind of wrong, because it survives a quick read. Review the diff line by line and run the code before you keep it.

Why this matters

The fastest way to misuse Claude is to treat it as an autopilot: paste a one-line request, accept whatever code comes back, and move on. Claude does not type slower than you, so that loop feels productive. The cost shows up later, in the gap between what you asked for and what the code actually assumed.

Picture a developer asking Claude to "make the auth module expire idle sessions after 30 minutes." Claude returns a clean, well-formatted change in seconds. The developer skims it, sees a timer and a comparison against a timestamp, and commits. What the prompt never mentioned: sessions can be refreshed concurrently from two tabs, and the existing code stores the last-active time in UTC while the new check reads it as local time. The change passes the one test the developer was watching. Two weeks later, users on the far side of a time zone get logged out after a few minutes, and the bug is now buried under a dozen unrelated commits.

Nothing there is an exotic model failure. The output was plausible, it compiled, and it passed the visible check. A December 2025 CodeRabbit study of 470 real-world pull requests put numbers on this pattern: AI-coauthored changes carried about 1.7x more issues than human-written ones, with logic and correctness defects up 75% and performance problems appearing nearly eight times as often. The defects are not random noise. They cluster exactly where the model had to guess because the context did not pin the answer down.

The payoff of working with Claude as a partner is concrete and learnable. The model predicts plausible output from what is in its context, and plausible is the dangerous kind, because it survives a quick read in a way an obviously broken answer never would. Your leverage is three gates the workflow gives you for free: the context you supply before it starts, the plan you agree before it writes, and the diff you read before you keep it. The mechanism behind those gates, how Claude actually works on a development task, is what the next section breaks down.

How it works

Claude is a language model that predicts the next token from the tokens already in front of it. On a development task that means it does not browse your repository or remember your project between conversations. It works entirely from what you place in the context window for this exchange. Anthropic's recommended workflow names the four phases where you stay in control.

Explore. Claude reads the files, errors, and types you supply and answers questions about them before proposing anything. This is where you confirm it understood the problem.
Plan. You ask for the approach, the impacted files, the risks, and the edge cases, then read and edit that plan before any code exists. The plan is the cheap place to catch a wrong assumption.
Implement. Claude drafts the change against the agreed plan. You ask it to run a check it can read, such as a test or a build, so "looks done" is replaced by a pass or fail.
Commit. You review the diff, run the code, and commit in small steps with a clear message. The merge is your decision, never the model's.

Two distinctions decide whether this works. The first is plan versus build. Anthropic's own guidance is that planning earns its overhead "when you are uncertain about the approach, when the change modifies multiple files, or when you are unfamiliar with the code," and that "if you could describe the diff in one sentence, skip the plan." A typo fix needs no plan. A change that touches three files and an interface does, because there the wrong approach produces a large diff built on the wrong foundation, and a large wrong diff is far more expensive to review than a short plan.

The second is author versus reviewer. A conversation that just wrote a change is primed to defend it, the same way an author rereads their own draft and sees what they meant rather than what they wrote. A fresh conversation that sees only the diff and the criteria, with none of the reasoning that produced it, evaluates the result on its own terms and catches the assumption the author baked in. This is why Claude is most useful on a real task as two roles, a Writer and a Reviewer, run in separate conversations.

The last piece is where your context lives. Anything specific to one task, the failing test and the error, belongs in the prompt. Anything that holds across every task, the stack, the conventions, the "always do X" rules, belongs in a place Claude reads automatically. On Claude.ai that place is a Project, a workspace with its own custom instructions and uploaded knowledge that every conversation inside it inherits. With the parts named, the next section walks one task through all four phases.

A worked scenario

Maya, a developer, needs to expire idle sessions after 30 minutes in an existing auth module. She has been burned before by accepting a plausible-looking change, so this time she works the three gates deliberately, using a Claude Project she set up for this codebase.

Lean on the Project for standing context. The Project's custom instructions already state the stack, the rule that invalid input throws a RangeError, and that times are stored in UTC. She uploaded src/auth/session.ts as Project knowledge. She does not re-paste any of it.
State the goal and the per-task context. She writes one outcome: "Expire idle sessions after 30 minutes of inactivity in src/auth/session.ts." She pastes the current refreshSession function and the test file verbatim, so the exact signatures are in the window.
Plan first, do not build. She ends the prompt with "Give me a short plan: approach, files touched, risks, and edge cases. Do not write code yet. List your assumptions." Claude proposes comparing now against lastActiveAt, and flags three edge cases: clock skew, two tabs refreshing at once, and a token already past expiry. It also asks whether lastActiveAt is stored in UTC. That question is worth more than the code, because it is exactly the bug from the first section.
Correct the plan, then implement. Maya confirms UTC and tells it to read the stored value as UTC. With the assumption fixed, she asks Claude to implement against the plan and to add a test for the concurrent-refresh case, then run the suite and show the result.
Review in a fresh conversation. She opens a new chat, pastes only the diff and the three constraints, and asks "what assumptions is this code making, and are any unsafe?" The fresh reviewer, not invested in the draft, points out that the timer uses a 30-minute constant duplicated from another file, a maintainability snag the writing conversation never mentioned.
Verify and commit. Maya reads the diff herself, runs the suite (now 18 passing tests, up from 16), confirms the concurrent-refresh test covers the two-tab case, and commits the small change with a clear message.

The before and after is the whole lesson. The autopilot version produced the time-zone bug and the duplicated constant, both invisible to a quick read. The partnered version cost Maya maybe four minutes and two clarifying exchanges, and it surfaced two defects she would otherwise have shipped. Maya and this same auth change carry into the pitfalls below.

Pitfalls and edge cases

Each of these traps feels like it speeds you up, and each quietly removes a gate that was keeping the work honest.

The autopilot accept. A clean-looking diff invites a rubber stamp. The fix is to read the change against your stated constraints and run the code, treating every diff as a draft. If you cannot verify it, do not ship it.
The endless improvement loop. Telling Claude "make it better" over several rounds without re-reading the diff is how a refactor silently changes behaviour. After two corrections on the same point, start a fresh conversation with a sharper prompt that incorporates what you learned, rather than piling more onto a cluttered context.
The kitchen-sink chat. One conversation for "fix the session bug, then update the docs, then look at the flaky test" fills the context with three jobs, and the answers drift as earlier instructions get crowded out. Start a fresh chat per unrelated task so each begins clean.
The author reviewing itself. Asking the same conversation that wrote the code to review it produces a defense, not a review. Move the diff to a fresh conversation that never saw the reasoning, and tell the reviewer to flag only issues that affect correctness or the stated constraints, so it does not over-engineer.

Two genuine edge cases sit underneath those traps. The over-eager reviewer. A model prompted to find gaps will almost always report some, even when the work is sound, because that is what it was asked to do. Chasing every finding leads to extra abstraction layers, defensive code, and tests for cases that cannot happen. The handling is to scope the review to correctness and the stated requirements, and to treat the rest as optional suggestions, not orders.

The hallucinated dependency. A 2026-specific wrinkle is that models still invent packages and APIs that do not exist, and roughly one in five generated samples references a nonexistent dependency. Claude proposing import { parseDuration } from 'time-utils' looks exactly like a real import. Any new import in a diff deserves a deliberate check that the package and the symbol are real before you install it, and a standing rule of "do not add new dependencies without asking" closes most of the risk. Keeping these guards consistent matters more once more than one person shares the codebase, which is where scale comes in.

Doing it on a team and at scale

One developer can hold the conventions in their head and re-paste them when needed. The moment a second person, a second machine, or a second month is involved, that habit has to live in a shared artifact, or Claude behaves a little differently for everyone and every prompt drifts from how the project actually works.

The durable artifact is the standing context, kept in one place the tool reads every time. On Claude.ai a Project holds the custom instructions and the uploaded reference docs the whole team works against, so a teammate joining a Project inherits the same ground without assembling it by hand. For repository work, the same role is played by a checked-in rules file: CLAUDE.md for Claude Code, or AGENTS.md as the cross-tool standard. Keep it short and high-signal, the stack, the conventions, and a small list of what to prefer and avoid, and prune it when a failure reveals a gap. A bloated context file is worse than a short one, because the model starts ignoring rules lost in the noise.

Priya, who leads the team, owns the part an individual setup cannot. She adds two lines to the pull-request template: which assistant produced a change and the prompt or task that drove it. That record turns a reviewer's vague unease into a concrete question, because reviewing AI output means asking "what assumptions is this code making, and are they safe?" rather than "does this look reasonable?". She also holds AI-coauthored pull requests to the same small-diff bar as any other, under roughly 400 lines, since review capacity, not code volume, is now the bottleneck, and a tight diff is the only kind a human can actually check against the constraints.

Scale also changes the order of the gates. Automated linters and scanners in CI catch a large share of routine issues before a person looks, which frees the human review for the judgment calls the tools cannot make: whether the approach fits the architecture, whether the edge cases are handled, whether a new dependency is even real. The Writer then Reviewer split survives onto a team naturally, since the reviewing engineer is, by definition, a fresh context that did not write the change.

The durable principle to keep, whatever the size of the team, is that Claude proposes and you decide. Supply the context, agree the plan, read the diff, and never trade all three away for the speed of an accept-everything loop. Start with one small thing: set up a single Project for one codebase with your three most-repeated conventions in its instructions, and run your next change as a plan first, then a build, then a fresh-context review.

Workflow

01State the goal and the constraints, and paste the real files, errors, types, and acceptance criteria.
02Ask for a short plan with the impacted files, the risks, and the edge cases before any implementation, and edit the plan where it is wrong.
03Let Claude draft the change against the agreed plan, and ask it to show a check it can run, such as a passing test or a clean build.
04Open a fresh conversation, paste the diff and the constraints, and ask the reviewer what assumptions the code makes and whether they are safe.
05Review the diff yourself line by line, then run the code and the tests so the result is evidence you can read.
06Move durable project rules into a Claude Project or a versioned rules file so you stop re-pasting them each session.
07Commit in small, reviewable steps so each change can be read and reverted on its own.

Common mistakes

Asking for code with no files or constraints, then debugging the assumptions Claude had to invent to fill the gap.
Telling Claude to improve the code over several rounds without re-reading the diff, which is how a refactor quietly changes behaviour.
Letting one chat sprawl across three unrelated tasks until the answers drift, instead of starting a fresh conversation.
Asking the same conversation that wrote the code to review it, so it defends its own draft rather than challenging it.
Treating a green-looking response as proof, when nobody actually ran the code or read the diff against the constraints.

Examples

Planning prompt

Here is src/auth/session.ts (pasted) and the goal: expire idle sessions after 30 minutes. Before writing code, give me a short plan: the approach, the files you would touch, the risks, and the edge cases I should test (clock skew, concurrent refresh, an already-expired token). Do not write code yet. List your assumptions so I can correct them first.

Fresh-context review

Review this diff. Do not assume it is correct. [paste the diff] Constraints it must hold: - Public signature of parseDate(input: string): Date is unchanged. - Invalid input throws a RangeError, never returns Invalid Date. - No new dependencies. What assumptions is this code making, and are any of them unsafe? Flag only issues that affect correctness or these constraints.

Notes

This page covers using Claude as a development partner across planning, debugging, refactoring, docs, and review through the chat and Projects surfaces. For the terminal agent that edits files and runs commands on its own, see Using Claude Code.
Pairs with Giving AI the Right Context for what to put in front of Claude, Planning Before Coding for the plan step, and Reviewing AI Code Safely for the diff gate every change passes through.