How to estimate story points: the only practical guide you'll actually use

Most story points content is vague training-deck filler. Here's the practical version — what story points actually measure, the modified-Fibonacci scale that works, the baseline-setting step most teams skip, the 30-seconds-per-ticket loop, six patterns for splitting too-big stories, and what AI estimates are worth.

May 5, 2026  ·  10 min read  ·  SprintFlint Team

Most “how to estimate story points” content reads like a corporate training course: vague principles, planning poker rituals, and 1,500 words that don’t help on a Monday morning. This post is the practical version. By the end you’ll have a way to estimate that your team will actually use, the relative scale that works in practice, and an honest take on the rituals (planning poker, t-shirt sizing) — what’s worth keeping and what to skip.

What story points actually measure

Story points represent relative effort, not time. A 5-point story is roughly five times as much work as a 1-point story. Whether that takes a day, a week, or two engineers depends on the team — and that’s the point. The number says “this much, relative to that other thing.”

The reason teams adopted points over hours: humans are bad at predicting how long something will take but reasonably good at saying “this is bigger than that.” Removing the time-clock pressure also removes the temptation to pad.

Effort here is composed of three things, fused into one number:

  1. Volume — how many units of work
  2. Complexity — how much novelty or technical difficulty
  3. Uncertainty — how confident is the estimate itself

A 5-point story might be a chunky-but-known piece of work, or a small-but-novel investigation. Both deserve the same number because both will take roughly the same effort.

The scale that actually works: modified Fibonacci

The accepted scale: 1, 2, 3, 5, 8, 13, ?

Why these numbers? Because the gaps grow as estimates get bigger, which mirrors human uncertainty. You can usually tell the difference between a 1 and a 2. You probably can’t tell the difference between an 11 and a 12. Forcing the team into 8 vs 13 is more honest.

The “?” replaces 21+. If a story feels bigger than 13, it isn’t a story — it’s an epic, and you should split it before committing to it. (See: story splitting, below.)

Some teams use t-shirt sizes (XS, S, M, L, XL) instead of points. Functionally the same. Use whichever your team prefers — fight battles that matter.

How to set the baseline (this is what most teams skip)

Most failed estimation comes from skipping the baseline-setting step. The team’s points only mean something to themselves — they’re a relative scale. So before you estimate anything, you need an anchor.

The cheapest baseline-setting exercise:

  1. Pull 10 representative tickets from the last two sprints
  2. Ask: “which one was the smallest? Call that 1.”
  3. Ask: “is each remaining ticket about 1×, 2×, 3×, 5×, 8×, or 13× as big?”
  4. Don’t argue about specific numbers — argue about which bucket
  5. Write it down and pin it in the team channel

Now you have an anchor: the team knows what a “3” looks like. Future estimation references this. After 5 sprints, the anchor should drift to whatever the team’s consensus has become — that’s fine.

Without this step, every team member estimates against their own private scale and the points are noise. With it, you have something useful within 2 sprints.

The actual estimation: 3 questions, 30 seconds per ticket

This is the practical loop. For each ticket up for estimation:

Q1: Is this similar to anything we’ve done?
If yes, point it like that thing. Done.

Q2: If no, what would it take?
Each engineer says one of: “trivial / small / medium / large / huge / can’t tell.”

Q3: Convergence check.
If everyone said the same: point it (1/2/3/5/8/13). Move on.

If estimates diverge by more than one bucket, the highest and lowest estimator each give a 30-second reason. This is where the real value comes — usually the high estimator knows about a hidden complexity, or the low estimator knows about an existing utility that solves it. After they explain, re-estimate. Don’t average. Pick a number, move on.

This is essentially planning poker, just stripped of the cards and the hour-long ceremony. A 6-person team can estimate 15 tickets in 25 minutes this way.

What planning poker is good at, and where it goes wrong

Planning poker — where everyone reveals their estimate at the same time on a card — is genuinely useful for one specific reason: it prevents anchoring. If a senior engineer says “this is a 5” out loud, juniors will agree even if they think it’s an 8. Hidden reveal stops that.

It goes wrong when:

  • It becomes a 90-minute ceremony for 12 tickets
  • The team argues about “5 vs 8” instead of moving on
  • Estimators fold to the most senior person rather than holding their estimate when they have insight

If your team is junior-heavy and prone to deferring to seniority, keep planning poker as a hidden-reveal mechanism. If your team is experienced and the dynamic is healthy, drop the cards — just have everyone say a number out loud at the same time.

Splitting stories that are too big

Anything 13+ is too big for a sprint. Split it.

Six practical splits, in order of preference:

1. Split by happy path vs edge cases. “Implement login” → “Login (happy path)” + “Login (forgot password / locked / expired token edge cases).”

2. Split by user role. “Build admin dashboard” → “Admin dashboard for super-admin” + “Admin dashboard for org-admin.”

3. Split by data input. “Import from CSV” → “Import schema A (most users)” + “Import schema B (10% of users).”

4. Split by interface layer. “Build the API” → “API endpoint” + “Frontend wiring” + “Background job for the slow part.”

5. Split by validation. “Add the new flow” → “Add the flow” + “Add validation + error handling for the flow.”

6. Split by experiment. “Build the new feature” → “Build it behind a feature flag for 5% of users” + “Roll out + monitor + clean up.”

If you can’t split a 13-pointer into smaller pieces using one of these patterns, the work isn’t actually understood yet. Add a 1-point spike: “investigate X, propose split.”

Things that are NOT story points

Common confusions worth naming:

  • Time. A 5-pointer isn’t 5 hours or 5 days. The team’s velocity tells you how points convert to time.
  • Difficulty. A novel-but-small problem can be 8 points; a tedious-but-easy migration can be 8 points. Same number, different texture.
  • Productivity score. Velocity is a forecasting tool, not a manager’s whip. The moment “we need to push velocity up” enters standup, your numbers stop meaning anything.

What about new teams with no history?

You don’t have a baseline. Three options:

Option A: Borrow. Use scale-modified-Fibonacci anchored to “a typical small bug fix is a 2.” Point everything relative to that. You’ll calibrate within 3-4 sprints.

Option B: Hours-first. For the first two sprints, estimate in ideal hours. Treat 1 day = 1 point, then transition to relative pointing once you have anchors. This gets you forecasting faster but trains the team to think time-first, which is the wrong mental model long-term.

Option C: Don’t estimate. Use #NoEstimates: track ticket throughput per sprint instead. Works for steady streams of similar-sized tickets. Less useful for forecasting feature delivery.

Option A is what most teams should do. Tell the team explicitly: “we’re recalibrating our scale every retro for the first 5 sprints, then it stabilises.”

What about AI-generated estimates?

LLMs can estimate stories from the description text. They’re surprisingly OK at it — usually within one bucket of what the team would say.

Useful as a sanity check (“the AI says 5, I estimated 8 — what am I missing?”). Not a replacement for the team’s judgement, because the AI doesn’t know your codebase, your team’s velocity history, or what’s “trivial here because we have a helper for it.”

SprintFlint does AI-suggested estimates on every ticket. The team can override; the suggestion is for the conversation.

TL;DR

  • Story points = relative effort, fusing volume + complexity + uncertainty.
  • Use 1, 2, 3, 5, 8, 13, ?. Anything bigger gets split.
  • Set a baseline before you start: pin 5-10 reference tickets to the channel.
  • 30 seconds per ticket: similar to past work? if yes, point it. if no, vote, divergence triggers a 30-second discussion, repick, move on.
  • Planning poker is useful only if your team has a deferring-to-seniority dynamic. Otherwise drop it.
  • Six story-splitting patterns; if none fits, you don’t understand the work yet.
  • Don’t confuse points with time, with difficulty, or with productivity.

Stop estimating in hours.

SprintFlint runs your sprints with story points, velocity, capacity, and retros built in. First 300 tickets free, no credit card.