Agile estimation techniques compared — story points,

Every team that runs sprints picks an estimation method. Most pick whatever the previous team used. Some never revisit it. A few hate the one they picked but can’t articulate what the alternative would actually feel like.

This post is for that last group. Five real estimation techniques, what each one is good at, what each one breaks under, and how to switch without burning a sprint.

The estimation techniques you’ll actually meet

In production agile, you’ll meet roughly five:

Story points (Fibonacci) — relative sizing on a 1/2/3/5/8/13 scale.
Planning poker — story points, but elicited via blind votes per ticket.
T-shirt sizing — XS/S/M/L/XL — relative sizing without numbers.
Ideal hours — absolute time estimates (“this is about 6 hours of work”).
No-estimates — count tickets, not points; rely on throughput.

Everything else (planning poker plus, magic estimation, dot voting, bucket estimation) is a flavour or a workshop technique on top of one of these five.

What each technique is actually doing

Estimation is two separate jobs glued together:

Surfacing disagreement — making the team notice when one engineer thinks a ticket is small and another thinks it’s huge.
Forecasting — letting the team and stakeholders predict how much work fits in the next sprint.

Different techniques optimise for different parts of this job. That’s the fork.

Technique	Surfaces disagreement?	Forecasts well?	Setup cost
Story points	OK	Yes (with history)	Low
Planning poker	Excellent	Yes (with history)	Medium
T-shirt sizing	OK	Weak	Lowest
Ideal hours	Poor	Misleadingly precise	Lowest
No-estimates	None	Yes (with throughput)	Low

If you’re optimising mostly for forecasting, story points or no-estimates win. If you’re optimising for catching the “wait — you think that’s a 3?” moment, planning poker is the only one built for it. T-shirt sizing is a starter wheel. Ideal hours is, almost always, a trap.

Story points (Fibonacci)

The default for most Scrum teams. The team agrees a baseline ticket — “remember that auth bug we shipped last month? That was a 3” — and sizes new tickets relative to it on a Fibonacci scale.

What it’s good at: Relative sizing scales. Humans are bad at “is this 6 or 8 hours?” but pretty good at “is this bigger or smaller than that ticket we did?” The non-linear scale (1/2/3/5/8/13) prevents false precision — you can’t argue whether a ticket is 6 vs 7 if those aren’t options.

Where it breaks: Two failure modes.

Calibration drift. Three months in, the team’s “5” no longer matches the original baseline. You think velocity is steady at 38 — actually you’re estimating bigger tickets at the same number. (See velocity dropped — here’s the actual playbook.)
Treating points as time. A stakeholder asks “is a 5 about a day?” and someone answers “yeah, roughly.” From that moment, points are time. Once that happens, all the benefits of relative sizing are gone.

Use it when: You’re running Scrum, sprints are 1-2 weeks, the team is roughly stable, and you can build up 4-6 sprints of velocity history.

Planning poker

Same scale as story points, but the elicitation is structured: everyone holds a card face-down, all reveal at once, then the highest and lowest justify their numbers and the team re-votes.

What it’s good at: Catching information asymmetry. The senior engineer who knows the legacy module thinks the ticket is a 13. The new joiner thinks it’s a 3. Both numbers are correct given what each person knows — the gap is the signal. Planning poker forces that gap into the open before sprint commitment.

Where it breaks: Time. A real planning poker session for 10-12 tickets runs 60-90 minutes. Teams shortcut it (“we’ll just take the senior’s number”) and lose the entire benefit. If your team can’t afford 90 minutes for planning, planning poker isn’t for you.

Use it when: Tickets vary widely in difficulty, the team has mixed experience with the codebase, and you’ve noticed that “small” tickets keep blowing up mid-sprint. The blow-ups are usually the disagreements you didn’t surface.

(SprintFlint’s story-point estimator is closer in spirit to planning poker — surface the gap, let the team converge.)

T-shirt sizing

XS / S / M / L / XL. No numbers. No Fibonacci. No conversion.

What it’s good at: Two things. Onboarding — teams new to estimation pick this up in 10 minutes; story points take a sprint or two of confusion. Coarse roadmap planning — when a PM asks “is this feature a small one or a big one?”, t-shirts answer that without forcing a fake number.

Where it breaks: Forecasting. You can’t sum t-shirts. “Four mediums and an XL” doesn’t give you a sprint forecast. Most teams that start with t-shirts eventually graduate to story points the moment a stakeholder asks “how many of these can we ship next quarter?”

Use it when: You’re estimating for roadmap-level planning, not sprint commitment. Or you’re a brand-new team that needs a starter wheel before story points.

Ideal hours

“This is about 6 hours of work.” Sometimes called ideal time or engineering hours.

What it’s good at: Almost nothing in agile sprint planning. It’s listed because you’ll meet it — usually inherited from a waterfall culture or a PM who finds points “too abstract.”

Where it breaks: Several places, all at once.

It anchors on the optimist. The engineer estimating can finish in 6 hours, but only if there are no interruptions, no merge conflicts, and the existing code matches the expected shape. None of those hold in real sprints.
Stakeholders treat it as a commitment. “You said 6 hours — it’s been three days” — even though the 6 hours was a contiguous-work estimate that’s never been a contiguous three days.
It hides scope drift. A ticket that grew from “6 hours” to “12 hours” looks like an estimation miss. A ticket that grew from “3 points” to “8 points” — you talk about scope, because of course points are about scope.

Use it when: Almost never. The one case: a sole engineer working on a contract with hourly billing. Everyone else: pick literally any other technique on this list.

No-estimates

You don’t estimate. You count tickets. You measure throughput (tickets/sprint) and forecast based on that.

What it’s good at: Cutting estimation overhead. A planning meeting goes from 90 minutes to 20. Engineers stop arguing about whether something is a 3 or a 5. The team focuses on slicing tickets to a similar size — which is the actually-useful skill — instead of estimating arbitrary tickets.

Where it breaks: Two prerequisites that not every team meets.

Tickets need to be roughly similar size. No-estimates collapses if half your tickets are 2-day chunks and the other half are 5-minute chip-aways. Throughput becomes meaningless. The discipline that keeps no-estimates working is aggressive ticket slicing — every story below ~2 days, every spike timeboxed.
Throughput history needs to exist. “We complete 14 tickets per sprint” only forecasts when you have 6+ sprints of data and the work shape hasn’t changed.

Use it when: Your team has the discipline to slice consistently, sprints are 1-2 weeks, and your stakeholders can hear “we’ll ship 14 tickets next sprint, here’s which ones” rather than insisting on a points number.

Picking one — three honest questions

If you’re choosing or switching, three questions:

1. What’s the actual problem with your current method?

Specific symptoms map to specific techniques:

Tickets keep blowing up mid-sprint? Planning poker (you’re missing disagreement).
Velocity is steady but stakeholders don’t trust it? Story points + a calibration check; or no-estimates with throughput history.
Estimation is taking longer than the work itself? T-shirts for roadmap, no-estimates for sprint.
Stakeholders are converting points to hours and getting angry? Either retrain them, or move to no-estimates so there are no numbers to convert.

2. What does your team have history in?

Switching techniques resets your forecasting accuracy for ~3 sprints. Don’t switch unless the current method is actively hurting. The cost of switching is real.

3. Are you using the technique, or just performing it?

The most common failure isn’t picking the wrong technique. It’s picking the right one and not actually doing the thing it’s for. Planning poker without the discussion. Story points that secretly mean hours. T-shirts that never get summed. No-estimates with no throughput tracking. The technique is fine; the discipline isn’t there.

Switching without burning a sprint

If you’re moving from one technique to another (most common: hours → points, or points → no-estimates):

Run both for one sprint. Estimate every ticket the new way and the old way. Don’t tell stakeholders yet. You’ll catch where the new technique behaves weirdly on your work.
Reset velocity expectations. Tell stakeholders explicitly: “we’re recalibrating for two sprints, the velocity number won’t be comparable.” Without this, you’ll get pressure based on a baseline that doesn’t apply anymore.
Don’t reuse the old labels. If you were estimating in hours and you switch to points, do not let people say “a 3 is about 3 hours.” That’s not a switch.
Wait six sprints before judging. Most teams who switch and immediately revert do so because the first three sprints feel chaotic. They always do. The forecast accuracy comes back at sprint 5-6.

The one rule that matters more than the technique

The technique you pick matters less than this: whatever you pick, the team picks together, and stakeholders learn what the numbers mean.

A team that runs ideal hours with full alignment and a stakeholder who understands them will out-forecast a team running planning poker where half the engineers don’t believe in it and the PM secretly converts points to days.

If you remember one thing from this post: estimation works when the whole loop (engineers + PM + stakeholders) trusts the same numbers. The fight isn’t between techniques. It’s between teams that built that trust and teams that didn’t.

Related reading:

Tools:

Agile estimation techniques compared — story points, planning poker, t-shirts, hours, no-estimates