Playbooks & Templates

AI Use Case Canvas

The AI Use Case Canvas is a one-page template you fill in before you build anything, to decide whether AI is the right tool for a specific problem and whether this is the use case worth backing first. It maps the problem, the current cost, the data you have, the human review point, the way you will measure success, and the build-or-buy call. It earns its place when an idea sounds exciting but nobody has yet written down what would make it succeed.

TemplateAI

When to use this

Someone proposes "we should use AI for this" and you need a structured way to say yes, not yet, or no.
You have three or four candidate AI ideas competing for the same budget and have to pick which one to back first.
A use case keeps getting pitched but nobody has written down the problem it solves or how you would know it worked.
You are about to spend engineering time building a custom model or workflow that a vendor might already sell.
A stakeholder is quoting a big ROI number and you want to pressure-test the assumptions behind it before committing.
The data the use case depends on is scattered across systems and you are unsure whether it is good enough to start.
You need a one-page artifact a sponsor can read in two minutes and approve, defer, or reject with a clear reason.

What it helps clarify

Whether the real problem is well enough defined that you could tell, later, if AI actually solved it.
Whether AI is the right tool here, or whether a rule, a script, a report, or a process fix would do the job cheaper.
Where a human has to stay in the loop, who that person is, and what they are allowed to approve or override.
Whether the data this use case needs exists, is accessible, and is clean enough to trust on day one.
Whether you should build this in-house or buy a tool that already does it, given that purchased solutions succeed far more often.
The single success signal you will measure, and the time horizon over which a win has to show up.

Why this matters

Most AI ideas die because nobody wrote down what would make them succeed. The 2025 MIT State of AI in Business report (the NANDA study) reviewed 300 disclosed pilots, 150 leadership interviews, and 350 employee surveys, and found that 95% of organizations saw no measurable return from their generative AI initiatives despite an estimated $30–40 billion in spending. The recurring cause was organizational. Teams started to build before the problem, the data, and the success measure were defined.

The canvas exists to catch that gap on one page, before the budget is spent. Each field forces a question that an excited pitch usually skips. What is the problem in real units. What does the status quo already cost. Would a plain rule or report do the job. Does the data even exist. Where does a human stay in control. How will we know, by when, that it worked. The discipline is cheap. Filling the canvas takes twenty minutes; discovering the same gaps after a quarter of engineering takes a quarter.

There is a second, sharper reason to fill it. The same MIT study found that pilots built on purchased tools succeeded about two-thirds of the time, while in-house builds succeeded only about one-third of the time. The build-or-buy field is on the canvas because the default answer is usually buy, and teams that skip the question tend to build generic capability they could have rented. The 5% of pilots that worked shared a pattern the canvas is designed to reproduce: a single well-defined pain point, a specialized tool rather than a from-scratch build, and a measurable target set before any code was written.

Picture Priya looking at four AI ideas her team has floated. Without a canvas, the loudest idea wins, or the one with the most confident ROI slide. With a canvas, the idea that cannot fill in "data and inputs" or "success signal" reveals itself in twenty minutes, and the budget goes to the one that can.

How to fill it well

The canvas only works if each entry is concrete enough to be checked later. The difference between a strong canvas and a weak one is almost always specificity. Walk the fields in order; the early ones set up the later ones.

Problem statement. Write the pain, not the product. A weak entry reads "we want to use AI to improve support." A strong entry reads "40% of our ~120 weekly tickets are the same repeat questions, and they sit unanswered for hours during busy periods." The strong version names a who, a frequency, and a felt cost. If your first sentence contains the word "AI," you are describing a solution and have skipped the problem.

Current cost of doing nothing. This is the baseline every later benefit is measured against, so it has to be a number. Weak: "support is slow." Strong: "two engineers lose ~6 hours each per week and median first response is 4 hours." Without this, you cannot prove a win even if you get one.

Why AI, and not a simpler tool. This is the field that kills the most use cases, on purpose. Many ideas pitched as AI are really a missing report, a stale FAQ, or a process fix. A strong entry earns the model: "the questions are phrased a hundred different ways, so keyword routing misses them, and matching intent across phrasings is what a language model is good at." If a WHERE clause or a scheduled report would solve it, name that and stop. AI is the expensive answer; reach for it only when the cheaper ones genuinely fail.

Data and inputs. Be honest about access and quality, because data readiness is the single most common reason AI projects stall. Weak: "we have lots of data." Strong: "18 months of resolved tickets in a system we own and can export, messy but labeled by category, with names we will redact first." A blank or hand-wavy answer here is a finding, not a formality.

Human in the loop. Name the review point, the person, and the authority they keep. Strong: "the assistant drafts a reply and cites the doc it used; a support engineer approves before it sends; AI never auto-sends." Weak: "a human will check it sometimes." If any step lets AI act without review, write that down explicitly and justify it, because that is where the real risk lives.

Success signal and horizon. Pick one measurable signal, a target, and a date. Strong: "median first response on the top 5 question types drops from 4 hours to under 30 minutes within 8 weeks, with no rise in reopened tickets." Weak: "improve customer satisfaction." Aim the horizon at the use case, not at a calendar. A common trap is judging everything on a 12-month payback, which quietly rejects a slower structural win; for an initial pilot, a 6–12 month signal is healthy, and you break anything longer into sub-12-month checkpoints.

Build or buy. Default to buy for generic capability and reserve build for what is specific to you. A strong entry gives the reason: "two vendors already ship suggested-reply features that read our docs; building a custom pipeline would take a quarter to match a two-week pilot." Building generic capability you could rent is one of the most expensive mistakes on this page.

Risks, then decision. Name what a confidently wrong answer breaks, who it hurts, and how you would catch it, including privacy and bias. Then record an actual decision with an owner: back it now, defer it with a named condition, or drop it. A canvas with every field full and no decision at the bottom is a note, not a decision.

Pitfalls and using it on a team

The canvas can feel finished while the thinking is still hollow. A few traps recur.

The solution masquerading as a problem. The first field reads "use AI to summarize tickets." That is a feature, not a pain. Restart and describe what hurts and what it costs, or you will optimize a solution nobody needed.
The ROI horizon that rejects a good slow win. A use case with a strong three-year payoff fails a rigid 12-month screen and gets dropped. Set the horizon to fit the use case and break long bets into checkpoints, so a real structural improvement is not killed by an arbitrary deadline.
Building what you could buy. The same 2025 MIT study found in-house builds succeeded roughly half as often as purchased tools. A canvas that defaults to "build" for generic capability is pointing at a likely failure. Make "buy" the starting assumption and force "build" to argue for itself.
A confident success number with no baseline. If "current cost of doing nothing" is blank, the ROI slide is fiction. You cannot show a 70% reduction without the original figure.
No human-in-the-loop entry for a high-stakes action. If the AI can send a message, change a record, or move money, and the review field is empty, that is the most dangerous gap on the page, not the least.

On a team, the canvas turns "which AI idea do we fund" from a debate into a comparison. When Priya runs four candidates through the same nine fields, they line up side by side, and the one that cannot fill in data or a success signal stops being a contender. A practical portfolio split many teams use is to put the bulk of effort into quick, high-feasibility wins that build confidence, a smaller share into a strategic bet, and a slice into deliberate learning, with nothing going to the idea that fails its own canvas.

Keep the filled canvases. At the pilot review you reopen the artifact, write the real numbers into the success field, and let the decision line record whether you expand, hold, or stop. That history is what stops the same rejected idea from being re-pitched every quarter as if it were new.

When the decision is "back it now," hand off to the next artifact. If you are building, the AI Coding Session Brief scopes the first session so the assistant gets the goal, the files, and the constraints. If the use case lets AI take action on its own, run the Agent Readiness Checklist before you wire it up, and the MCP Readiness Checklist before you connect it to a real tool. For the success-signal field, the Outcome Over Output note in Better Ways of Working is the companion read. On your next real idea, fill just the first three fields, problem, current cost, and why AI; if you cannot, you have your answer before you have spent a dollar.

The canvas

Nine fields. Fill them top to bottom in one sitting; if you cannot answer a field honestly, that gap is the finding.

Problem statement : Describe the actual pain in one or two sentences. Who feels it, how often, and what it costs them today. Write the problem, not the AI solution.
Current cost of doing nothing : Quantify the status quo. Hours per week, dollars per month, error rate, or wait time. This is the baseline every benefit gets measured against.
Why AI (and not a simpler tool) : State why this needs a language model or a learned pattern rather than a rule, a SQL query, a report, or a process change. If a simpler tool would work, name it and stop here.
Data and inputs : List the data the use case consumes, where it lives, who owns it, and whether it is accessible and clean today. Note any gap that has to close before you can start.
Human in the loop : Name the point where a person reviews, approves, or overrides the AI, who that person is, and what decisions they keep. Mark any step where AI acts without review and justify it.
Success signal and horizon : Define the one measurable signal that means this worked, its target, and the date by which you expect to see movement. One signal, not a dashboard.
Build or buy : Decide whether you build this in-house or adopt an existing tool, and give the reason. Default to buy for generic capability; reserve build for what is genuinely specific to you.
Risks and failure modes : List what goes wrong if the AI is confidently incorrect, who gets hurt, and how you would catch it. Include privacy, bias, and the cost of a silent wrong answer.
Decision : Record the call (back it now, defer with a named condition, or drop it) and the owner. A canvas with no decision is a note, not a decision.

Example

Worked example: triaging the support inbox at an internal tools team

Problem statement Priya’s team answers ~120 internal support tickets a week. Roughly 40% are the same handful of "how do I export paginated orders" questions, and they sit unanswered for hours during busy periods. Current cost of doing nothing Two engineers spend ~6 hours each per week on repeat questions. Median first response is 4 hours. Three escalations last quarter were just a missed FAQ. Why AI (and not a simpler tool) A static FAQ already exists and is ignored. The questions are phrased a hundred different ways, so keyword routing misses them. Matching intent across phrasings is what a language model is genuinely good at. Data and inputs 18 months of resolved tickets in Zendesk (we own it, exportable), plus the internal docs site. Tickets are messy but labeled by category. No PII beyond names; we can redact before use. Human in the loop The assistant drafts a reply and cites the doc it used. A support engineer approves or edits before it sends. AI never auto-sends. Sam owns the review queue. Success signal and horizon Median first response on the top 5 question types drops from 4 hours to under 30 minutes within 8 weeks, with no rise in reopened tickets. Build or buy Buy. Two helpdesk vendors already ship suggested-reply features that read our docs. Building a custom retrieval pipeline would take a quarter to match what we can pilot in two weeks. Risks and failure modes A confidently wrong answer about permissions could send someone down the wrong path. Mitigation: human approval on every send, plus a "not sure, escalating" fallback the model is told to prefer. Decision Back it now as a 2-week pilot on the top 5 question types. Owner: Priya. Condition to expand: the success signal holds for two consecutive weeks.

Usage notes

Fill it for the problem, not the solution. If the first field already names a model or a product, restart and describe the pain instead.
Treat an unanswerable field as the result. A blank "Data and inputs" or "Success signal" is the canvas doing its job, telling you this use case is not ready.
Default to buy for anything generic. MIT’s 2025 State of AI in Business study found purchased tools succeeded about twice as often as in-house builds; reserve "build" for what is specific to you.
Revisit the canvas at the pilot review, not just at kickoff. Update the success signal with real numbers and let the decision field record whether you expand, hold, or stop.
When the decision is "back it now," the next artifact is the AI Coding Session Brief if you are building, which scopes the first session, or the Agent Readiness Checklist if the use case lets AI act on its own.
Read the Outcome Over Output note in Better Ways of Working alongside this; it sharpens the success-signal field from "use AI" to "what changes for whom, and how we will know."

Copyable canvas

# AI Use Case Canvas: [use case name] ## Problem statement - Who feels the pain, how often, what it costs today: ## Current cost of doing nothing - Baseline (hours / dollars / error rate / wait time): ## Why AI (and not a simpler tool) - Why a rule, script, report, or process fix will not do: ## Data and inputs - Data needed / where it lives / owner / accessible and clean today?: - Gap to close before starting: ## Human in the loop - Review or approval point / who / what they decide: - Any step where AI acts without review (and why): ## Success signal and horizon - One measurable signal / target / date we expect movement: ## Build or buy - Decision and reason (default to buy for generic capability): ## Risks and failure modes - What a confident wrong answer breaks / who is hurt / how we catch it: ## Decision - Back now / defer (condition) / drop: - Owner:

Downloadable version

A one-page canvas that takes a candidate AI idea from problem to decision in nine fields.

Preview

Field	What to capture	Example entry
Frame the problem
Problem statement	The real pain in one or two sentences: who feels it, how often, what it costs. Write the problem, not the AI.	40% of ~120 weekly tickets are the same repeat questions that sit unanswered for hours.
Current cost of doing nothing	The status-quo baseline in real units: hours, dollars, error rate, or wait time.	Two engineers lose ~6 hours each per week; median first response is 4 hours.
Why AI (and not a simpler tool)	The reason a rule, script, report, or process fix will not do the job as well or cheaper.	Static FAQ is ignored; questions are phrased a hundred ways, so keyword routing misses them.
Check feasibility
Data and inputs	The data the use case needs, where it lives, who owns it, and whether it is accessible and clean today.	18 months of resolved tickets we own and can export; messy but labeled; redact names first.
Human in the loop	The review or approval point, who staffs it, and what they keep authority over. Flag any unreviewed action.	Assistant drafts and cites a doc; a support engineer approves before send; AI never auto-sends.
Build or buy	The build-versus-buy call and the reason. Default to buy for generic capability.	Buy: two vendors ship suggested-reply features that read our docs; building would take a quarter.
Decide and de-risk
Success signal and horizon	The one measurable signal that means this worked, its target, and the date you expect movement.	Median first response on top 5 question types drops from 4 hours to under 30 minutes in 8 weeks.
Risks and failure modes	What a confident wrong answer breaks, who is hurt, and how you would catch it. Include privacy and bias.	Wrong permissions advice misleads a user; mitigated by human approval and an escalate fallback.
Decision	The call (back now, defer with a named condition, or drop) and the owner.	Back now as a 2-week pilot on the top 5 questions; owner Priya; expand if the signal holds two weeks.

Download Excel Download PDF