Practice · Better Ways of Working
Outcome Over Output
Output is what you ship. Outcome is the change it creates for someone. This practice is the habit of pausing the "build this" reflex long enough to name the problem you are solving, the person who has it, and the observable signal that will tell you it worked. It turns a feature into a claim you can check, so you stop funding assumptions on faith.
When this matters
- A request arrives as a solution ("add a button", "build a report") with no stated problem behind it.
- A team is busy shipping every sprint but cannot say whether any of it has helped a real person.
- You are deciding whether a piece of work is worth doing at all and need a basis beyond "someone senior asked".
- A roadmap is a list of features with dates, and nobody can name what changes for a customer if it all ships.
Key ideas
- A feature is a bet
- Every output is a guess that it will produce some change. Saying the bet out loud lets you check it later instead of assuming the work paid off because it shipped. The odds are humbling: Pendo found about 80% of features in the average product are rarely or never used, and at large platforms 80-90% of experiments move the target metric neutrally or the wrong way.
- Problem, person, signal
- Three questions reframe almost any request. What problem are we solving, who has it, and what observable signal will tell us it improved? If you cannot name the third, you cannot tell a real result from motion. "Improve the experience" stays a wish, because nothing you observe could ever prove it false.
- Outcome is the change itself
- An outcome is a change in behavior or in the business, written so a number could confirm or deny it. "Conversion rises from 15% to 25%" is an outcome. "Ship ten features" is an output. Product outcomes (a behavior shift you can watch this week) are leading indicators; revenue and churn are lagging indicators that arrive too late to steer by.
- Smallest bet, soonest signal
- You learn whether an outcome is real by shipping the smallest version that could move the signal, then looking at it within days while the result is still cheap to act on. This mirrors how discovery teams work, starting from the outcome, then the opportunity, then the smallest solution that tests the riskiest assumption. A weekly habit of testing the next assumption beats one large output that delays the only feedback that matters.
Why this matters
Teams that measure themselves by what they ship can stay busy for years without helping anyone. The output looks like progress: features close, the burndown chart drops, the demo goes well. Whether any of it changed a customer behavior or a business number is a separate question, and it usually goes unasked.
The data on this is uncomfortable. Pendo analyzed feature usage across hundreds of products and found that roughly 80% of features in the average software product are rarely or never used. At companies running disciplined experimentation, the story is similar: published figures from large platforms suggest 80-90% of changes move the metric they were meant to improve neutrally or in the wrong direction, and one widely cited Slack figure put 70% of monetization experiments as not yielding the intended result. This is careful, well-designed work that simply did not produce the change the teams assumed it would.
Here is what goes wrong in practice. A stakeholder asks finance for an export-to-CSV button. The team builds it, ships it, and closes the ticket. A quarter later, finance is still re-keying data for an hour every week, because the real problem was a date format the export did not fix. The output shipped. The outcome never arrived. Nobody noticed, because nobody had written down what they expected to change or when they would check.
The payoff of the opposite habit is concrete. When you name the problem, the person, and the signal before building, you can kill a bad idea in two weeks instead of carrying it for a year. You stop arguing about opinions in the demo and start reading the number. This is the shift product teams have made over the last decade, and it is now mainstream: surveys report that around 92% of product leaders own a revenue or outcome metric, roughly double the share of a few years ago. The mechanism that makes the shift work is a small, repeatable framing you can apply to any request, which the next section unpacks.
How it works
The core move is to translate a request into a checkable claim before you build it. Three plain words carry most of the weight: problem, person, signal. Get those right and the rest of the method follows.
- Output is the thing you ship: the button, the report, the API, the screen. It is easy to observe and easy to count. Counting it tells you the team was busy, nothing more.
- Outcome is the change the output is meant to cause: a behavior that shifts, a cost that drops, a number that moves. It is the reason the output exists.
- Signal is the observable measure of that change. A useful signal is specific enough that a number could prove the bet wrong. "Improve the experience" cannot be wrong, so it is a wish. "Share of trial users reaching their first saved project within 48 hours rises from 22% to 35%" can be wrong, so it is a signal.
Signals come in two kinds, and the distinction decides whether you can steer. A leading indicator is a behavior you can watch within days, like activation rate or time-to-first-value. A lagging indicator like quarterly revenue or annual churn arrives after the fact, when it is too late to change the bet that caused it. Prefer a leading signal for the look-back, and let the lagging one confirm the trend later. A workable outcome reads like a SMART statement: specific, measurable, achievable, relevant, and time-bound.
The second move is sizing the bet. Once you have an outcome and a signal, ask the smallest version that could plausibly move that signal. This mirrors how discovery teams structure work: start from the desired outcome at the top, branch to the customer opportunity that drives it, then to candidate solutions, then to the single riskiest assumption each solution rests on. You build the smallest thing that tests that one assumption, and you leave the full vision for later.
A worked instance: the outcome is to cut the hour finance loses each week to re-keying. The signal is that logged hour. The riskiest assumption is that a CSV export removes the manual step at all. So the smallest bet is to export just the one report they copy most, ship it, and read the time log after two weeks. If the hour does not move, you have spent two weeks learning the export was the wrong fix, instead of a quarter learning it after building the full feature. By the end of this framing, you can take almost any "build this" request and turn it into a bet you can settle, which the next section does on a single running example.
A worked scenario
Maya is a product manager on a small SaaS team. A request lands from sales: "Customers keep asking for a dark mode, build dark mode." It is phrased as a solution with a deadline attached. Maya runs the framing before the team estimates anything.
- Problem and person. She asks sales what the customers actually said. The real complaint is from power users who work late and report eye strain after long sessions. So the problem is comfort during long evening sessions, and the person is the heavy daily user.
- Outcome. The change she wants is for long-session users to keep working comfortably into the evening, where today they drop off. Written as a signal she can read: evening session length for heavy users stops shortening after 7pm, and the related "too bright" support tickets fall.
- Signal and where it lives. Session length by hour is already in the analytics tool. Support tickets tagged "display" live in the help desk. Both are observable within two weeks, so they are leading indicators she can act on quickly.
- Smallest bet. The riskiest assumption is that brightness is the cause at all. Full dark mode across every screen is weeks of work. The smallest test is a single dimmed theme on the two screens power users live in, behind a setting, shipped to the heaviest 200 users.
- Look-back date. Maya puts a review on the calendar for two weeks out, before any code is written, so the check is a commitment.
Two weeks later the numbers come back. Evening session length for the test group held steady where it had been declining, and "too bright" tickets from that group dropped by about half. The bet paid off, so the team continues, widening the theme to all screens. Had the signal stayed flat, Maya would have stopped and reopened the problem, having spent days on the test instead of a full sprint cycle. The reframe also changed what got built. The outcome set the scope, which narrowed the work to a focused dimmed theme on two screens, well short of a full visual overhaul.
Pitfalls and edge cases
The framing is simple, which is exactly why it gets skipped or distorted. A few traps recur.
The vanity outcome. Teams reach for a number that always goes up, like total logins or pageviews, because it makes the review comfortable. The fix is to pick a signal tied to the specific behavior you claimed would change, even when it might embarrass you. If the honest signal is hard to move, treat that as useful information and keep it; swapping in an easier number only hides the truth.
Gaming the metric. Once a number becomes the target, people optimize the number itself while the outcome behind it stays untouched. A team told to raise "activation" can technically lift it by counting a trivial click as activation. Guard against this by keeping the human problem visible next to the metric, and by pairing the leading signal with a quality or guardrail measure that would catch a hollow win.
The slow signal. Some outcomes genuinely take a quarter to show, like annual renewal or enterprise churn. You cannot read those in two weeks. Here you use a leading proxy for the look-back (a usage pattern that historically predicts renewal) and treat the lagging number as later confirmation, so you still get a fast read without pretending the slow metric moved.
The process with no owner. If nobody owns the signal, the look-back quietly never happens. Name who reads the number and when, in the same breath as the bet. An outcome without an owner and a date is a wish with extra steps.
The genuinely exploratory bet. Sometimes the thing you are buying is information, like a spike to learn whether an integration is even feasible. State that honestly. The outcome here is a decision ("we will know by Friday if this is buildable in the quarter"), and the signal is the decision being made on time. Framing this way keeps research labeled as research and keeps a value claim labeled as a value claim, so neither one gets dressed up as the other. Once a single bet is framed cleanly, the question becomes how a whole team keeps doing it under pressure, which is where this scales.
Doing it with a team and at scale
One framed bet is a habit; a team that frames every bet is a culture, and the difference is cadence. The practice survives when it is built into the rhythm the team already has, so it stays part of the work and avoids becoming extra ceremony.
The lightweight artifact that holds it together is a one-line bet per item: problem, person, signal, smallest output, look-back date. Keep these on the roadmap itself so the roadmap reads as a list of changes you expect to cause, where each line names a person and a signal. When a request enters through a single intake queue, the entry rule is that it cannot be scheduled until it has a problem and a candidate signal attached. That one gate filters out most of the "someone asked" work before it consumes a sprint.
The cadence has two beats. Discovery teams run a continuous habit, often a weekly touchpoint with real users, so assumptions get updated before delivery commitments harden. Delivery teams run the look-back, a short standing review where each bet that hit its date is read against its signal and marked continue, adjust, or stop. The marking matters as much as the building, because a recorded "stop" is what stops the idea from returning next quarter under a new name.
This also connects outward to the rest of how a team works. Engineering measurement has moved the same direction: the DORA program, which studies software delivery performance, reframed its metrics around outcomes and in 2025 added a fifth, a rework or failure-driven measure, precisely because delivery speed alone told teams what was happening but not whether it was worth doing. When you have more candidate outcomes than capacity, prioritization frameworks like WSJF (cost of delay divided by job size) let you sequence the framed bets by economic value, so the loudest request stops winning by volume alone.
The durable principle is small. Treat every piece of work as a claim about a change for a person, write the claim so a number could prove it wrong, ship the smallest version that tests it, and look. Do that consistently and the 80% waste figure stops being a statistic about other teams and starts being the pile of bad bets you killed early.
Practical steps
- 01Restate the request as a problem and the person who has it, in one plain sentence, before any solution words appear.
- 02Write the outcome you expect, described as a change that will visibly happen for that person or in the business.
- 03Pick a signal you could actually observe within a sensible window, prefer a leading indicator, and note exactly where the data lives.
- 04Choose the smallest output that could plausibly move that signal, and write down the single assumption it tests.
- 05Set a look-back date before you start building, so the review becomes a fixed commitment on the calendar.
- 06Ship the small version, then look at the signal on the agreed date and decide to continue, adjust, or stop.
- 07Record the result next to the original bet, so the same unverified idea cannot quietly return next quarter.
Common mistakes
- Counting shipped features as success and never returning to check whether the signal actually moved.
- Defining the goal so vaguely ("improve the experience", "drive value") that no observation could confirm or deny it.
- Choosing a lagging metric like quarterly revenue as the only signal, so the feedback arrives long after you could act on it.
- Skipping the look-back date, which lets the same unverified bets get re-funded under a new name every planning cycle.
- Treating the outcome as a target to hit at any cost, which invites gaming the number instead of solving the problem.
Examples
Notes
- This page covers framing a single piece of work as an outcome. Comparing many framed bets against each other to decide sequence is a separate skill.
- Pairs with Prioritization Without Chaos when several outcomes compete for the same week, and with Powerful Questions for the discovery questions that surface the real problem behind a request.