Guide · AI in the Developer Workflow

Giving AI the Right Context

An assistant predicts plausible output from what is in its context window, so the result tracks the files, errors, types, and constraints you actually supply. Your lever of control is the context itself. You decide what goes into that window, in what order, and what stays out, before you ask for a plan or a diff.

ContextConstraintsConventions
~9 min

When to use this

  • The assistant keeps making assumptions that do not match how your project is actually built.
  • A change depends on a type, a config value, or an error that lives outside the obvious file.
  • You paste a long thread and the answers get vaguer the further the conversation goes.
  • The assistant invents a function, a flag, or a package that does not exist in your codebase.

Key ideas

High signal over high volume
A model spends a finite attention budget on every token it processes, and accuracy falls as the window fills with low-value content. Anthropic frames the goal as the smallest set of high-signal tokens that pins down the answer. Aim for the few files and facts the change depends on, not the whole repository.
Position matters
Models attend strongly to the start and end of the context and weakly to the middle. The original lost-in-the-middle study measured accuracy falling by more than 30 percentage points when the answer moved from the edge to the middle of a long input, driven by position alone. Put the load-bearing constraint and the target file near the top or the bottom of your prompt, not buried in a wall of pasted code.
Show the real artifact
Paste the actual file, the full stack trace, and the real type definitions. A prose summary drops the line numbers, field names, and signatures the model needs to be exact. In Claude Code an @-mention pulls the exact file contents into context, and reading files in a subtree also surfaces any nested CLAUDE.md that lives there.
Constraints are context too
What must not change, the convention to follow, and what done looks like are part of the brief. Without them the assistant fills the gap with a plausible guess, and a plausible guess is the kind that survives review and breaks later.
Persistent versus per-task
Project rules that never change belong in a versioned file the tool loads every session, such as CLAUDE.md or AGENTS.md. Facts specific to one task, like the failing test and the error, belong in the prompt. Separating the two keeps the standing file short and the per-task context sharp.

Why this matters

The assistant does not read your repository the way you do. It predicts plausible output from the tokens in front of it. When the right tokens are missing, it does not stop and ask. It fills the gap with the most statistically likely code, which usually looks correct and quietly assumes a world that is not yours.

Picture a developer asking an assistant to fix a date parser. The prompt says "parseDate returns Invalid Date for bad input, make it throw instead." That sounds complete. What the prompt left out: the codebase already has a house rule that invalid input throws a RangeError, and a downstream caller catches exactly that type. The assistant, with no sight of the rule or the caller, writes a clean fix that throws a generic Error. The test the developer was watching goes green. Three files away, the catch (e instanceof RangeError) branch silently stops matching, and a real bad date now crashes the request handler in production.

Nothing in that story is an exotic model failure. The output was plausible, the change was small, and it passed the one check that was visible. The defect lives in the gap between what the developer knew and what the context contained. That gap is the most expensive thing in AI-assisted coding, because a plausible wrong answer survives a quick read in a way an obviously broken one never would.

The payoff of closing the gap is concrete. The shared spine of this whole topic is that the assistant predicts from its context, and your control comes from three levers: the context you supply, the plan you agree before it writes, and the diff you review before you keep it. Context is the first lever, and it sets a ceiling on the other two. A plan built on missing facts plans the wrong thing, and a diff built on a wrong assumption reviews as fine. The mechanism that lets you control the context, the finite attention budget and where the model actually looks, is what the next section breaks down.

How it works

A model has a fixed budget of attention it spends across every token it processes. Anthropic describes good context engineering as finding the smallest possible set of high-signal tokens that maximize the likelihood of the outcome you want. Every token you add depletes that budget, so each added token is a real cost against the model's attention. The parts you are managing are these.

  • The artifacts. The actual files the change touches, the type definitions and config they depend on, and the error or failing test, pasted verbatim so the line numbers and signatures survive.
  • The constraints. What must not change, the convention to follow, and what done looks like. A constraint is context, because it removes a guess the model would otherwise make.
  • The standing rules. Project facts that never change for one task, kept in a versioned file the tool loads automatically rather than re-pasted each time.
  • The conversation history. Everything already said in the thread, which counts against the budget whether or not it is still relevant.

Two facts about how models read decide whether your context lands. The first is context rot: every frontier model tested gets less accurate as the input grows, because a transformer has to track n squared pairwise relationships across n tokens and attention stretches thin. Researchers found model correctness starting to drop around 32,000 tokens even on models advertised with million-token windows. A bigger window is not a reason to paste more.

The second is position. Models attend strongly to the beginning and the end of the context and weakly to the middle, a pattern documented as lost in the middle by Liu and colleagues in 2023. Their study measured accuracy dropping by more than 30 percentage points when the answer document moved from the first position to the middle of a twenty-document context, driven by position rather than content. The practical move is to put the load-bearing constraint and the file under change near the top or the bottom of your prompt, and to resist dumping a long, low-value block in the middle where the thing that matters can hide.

This is why the distinction between persistent and per-task context matters. Standing rules go in a file the tool reads every session. CLAUDE.md is the file Claude Code reads at session start. AGENTS.md is the open cross-tool standard, donated by OpenAI and read natively by Cursor, GitHub Copilot, Codex, Gemini CLI, and others across 60,000 or more repositories. Claude Code does not read AGENTS.md on its own, so a mixed-tool team points to it with a single @AGENTS.md import line at the top of CLAUDE.md. Per-task facts go in the prompt. Keeping the two separate stops the standing file from bloating into the kind of mid-context wall that triggers rot. With the mechanism in hand, the next section walks one change through end to end.

A worked scenario

Maya, a developer, picks up the exact bug from the first section. The util parseDate returns Invalid Date for an out-of-range month, and a failing test now expects it to throw. Her first instinct is the one-line prompt, "make parseDate throw on bad input." She catches herself and builds the context instead.

  1. State the goal and the success behaviour. She writes it as one sentence: "Make parseDate('2026-13-01') throw instead of returning Invalid Date, so the failing test passes."
  2. Attach only what the change touches. She @-mentions src/util/parseDate.ts and parseDate.test.ts, which pulls the exact file contents into context. She does not paste the rest of src/util. Because she uses Claude Code, reading files in that subtree also surfaces the nested CLAUDE.md there, which already states the throw-a-RangeError rule.
  3. Paste the error verbatim. She copies the actual assertion, AssertionError: expected [Function] to throw an error, rather than retyping it. The exact text tells the model the test asserts a throw, not a return value.
  4. Front-load the constraint. At the top of the prompt she writes: keep the parseDate(input: string): Date signature, throw a RangeError to match the house style, and do not add a date library. The most load-bearing rule sits where the model reads it best.
  5. Ask for a restatement first. She ends with "restate the task and list your assumptions before editing." The assistant confirms it will throw a RangeError and asks whether February 29 on a non-leap year also counts as out of range. That clarifying question is worth more than the code, because Maya had not considered it.
  6. Review and commit. The returned diff is nine lines, throws the right type, adds no dependency, and the test goes green. She runs the suite, confirms the downstream catch still matches, and commits.

The before and after is the whole lesson. The one-line prompt would have produced a generic throw new Error(...), a green test, and the silent production crash from the first section. The built context cost Maya maybe ninety seconds and one clarifying exchange, and it produced a fix that respected a rule she did not even have to remember, because it lived in the file. Maya and this same parseDate change carry into the pitfalls below.

Pitfalls and edge cases

Each of these traps feels like giving the model more to work with, while quietly making the output worse.

  • Pasting the whole repo. Dumping every file feels thorough, but it buries the relevant lines under thousands of tokens and triggers context rot. The fix is to attach the file under change plus its direct dependencies, and let the tool retrieve the rest on demand through search or @-mentions.
  • Describing instead of showing. Retyping a bug in your own words drops the line numbers, the exact field names, and the stack frame order. Paste the real artifact. The verbatim trace is the highest-signal context you have for a debugging task.
  • The never-ending thread. A long conversation keeps every earlier message in the window, so a stale assumption from twenty turns ago still steers the answer. When replies start drifting, start a fresh session with a clean, scoped context rather than adding more on top.
  • Leaving done implicit. If you do not say what success looks like, the model optimizes for plausible-looking code rather than your actual outcome. State the success behaviour and the constraints that must hold, as Maya did.

Two genuine edge cases sit beyond the simple rules. The fact that lives off-screen. Sometimes the cause is a value in an environment file or a behaviour in a dependency you would never think to attach. Here the just-in-time approach beats pre-loading: give the agent the ability to search and read, and a lightweight pointer ("the timeout is set in config/http.ts"), rather than guessing in advance which file holds the answer.

The hallucinated dependency. A wrinkle that is still live in 2026: models invent packages that do not exist. A USENIX Security 2025 study generated 576,000 code samples and found commercial models hallucinating package names at around 5% and open-source models near 22%, and over half of those invented names recurred across runs, which is exactly what makes slopsquatting a workable supply-chain attack. An attacker registers the plausible name the model keeps suggesting, and waits for someone to install it. Adding a constraint like "do not add new dependencies without asking" to your standing rules file closes most of this, and any new import in the diff deserves a deliberate check that the package is real before you install it. Keeping these guards consistent across a team is where scale comes in.

Doing it on a team and at scale

One developer can carry the conventions in their head and re-paste them when needed. The moment a second person, a second machine, or a second month is involved, that habit has to live in a versioned artifact, or every prompt drifts a little further from how the project actually works.

The durable artifact is a checked-in rules file the tool loads automatically. CLAUDE.md is read at the start of every Claude Code session. AGENTS.md has become the cross-tool standard, read natively by Cursor, GitHub Copilot, Codex, and others, so a single file can serve a mixed-tool team. Claude Code reads its own CLAUDE.md rather than AGENTS.md, so the common pattern is one AGENTS.md with a one-line @AGENTS.md import at the top of CLAUDE.md, which keeps both tools pointed at the same source of truth. Keep the file short and high-signal: the stack, the naming and error-handling conventions, and a small ## Preferred / ## Avoid block with real code, which is one of the most effective things a context file can contain. Prune it when an agent failure reveals a gap, and resist letting it grow into the mid-context wall that defeats the point.

A lead, Priya, owns the part the individual file cannot. She adds two lines to the pull-request template: which assistant produced the change and the prompt or task that drove it. That record turns a reviewer's vague unease into a concrete question, because reviewing AI output means asking "what assumptions is this code making, and are they safe?" rather than "does this look reasonable?". She also keeps pull requests small, under roughly 400 lines, since review capacity is now the bottleneck, ahead of code volume, and a tight diff is the only kind a human can actually check against the constraints.

Scale also shifts where context comes from. Standing rules go in the versioned file, the failing test and the error stay per-task in the prompt, and shared knowledge the model needs at runtime moves behind tools the agent can query, rather than getting pasted into every conversation. The split keeps each surface lean.

The durable principle to keep, whatever the size of the team: the assistant is only as good as the high-signal context it can see, and your job is to curate that context down to the high-signal minimum. Start with one small thing. Move the three conventions you re-type most often into a CLAUDE.md or AGENTS.md today, and stop pasting them by hand.

Workflow

  1. 01State the goal and the behaviour you expect when the change is working.
  2. 02Attach only the files the change touches, plus the types and config they depend on, by path or @-mention.
  3. 03Paste the exact error output and the failing test verbatim, not a description of them.
  4. 04Put the load-bearing constraint near the top, and spell out what must stay unchanged.
  5. 05Move durable project rules into CLAUDE.md or AGENTS.md so you stop re-pasting them each session.
  6. 06Ask the assistant to restate the task and list its assumptions, and correct any that are wrong before it writes.
  7. 07Review the diff against your constraints, then commit in small steps.

Common mistakes

  • Describing a bug in your own words instead of pasting the full stack trace and the failing assertion.
  • Pasting the whole repository, so the few relevant lines get buried under thousands of irrelevant tokens.
  • Letting a long chat thread accumulate stale context until the answers drift, instead of starting fresh.
  • Leaving the success criterion implicit, so the assistant guesses what done means and optimizes the wrong thing.
  • Repeating the same project conventions in every prompt instead of putting them in a file the tool reads automatically.

Examples

Context block
Goal: fix the failing test in parseDate.test.ts. Constraints (read first): - Keep the signature parseDate(input: string): Date. - Reject out-of-range months and days. Do not add a date library. - Match the existing error style: throw new RangeError(...). File (pasted): src/util/parseDate.ts Failing test (pasted): expects parseDate('2026-13-01') to throw, but it returns Invalid Date. Error (verbatim): AssertionError: expected [Function] to throw an error. Restate the task and list your assumptions before editing.
A CLAUDE.md stub
# Project rules ## Stack - TypeScript, Node 22, Vitest. No default exports. ## Conventions - Throw RangeError for invalid input, never return Invalid Date. - Keep public signatures stable unless the task says otherwise. ## Preferred / Avoid - Preferred: small pure functions in src/util. - Avoid: adding new dependencies without asking.

Notes

  • This page covers what to put in the context window and what to leave out. It does not cover writing the plan or reviewing the diff, which are the next two levers.
  • Pairs with Planning Before Coding, where the agreed plan becomes part of the context, and with Reviewing AI Code Safely, where you check the diff the context produced.