Sprint Velocity Benchmarks: What numbers healthy teams

“Is our velocity good?” is the most common question engineering managers ask after their team’s first quarter of running sprints. It’s also a question with no honest, clean answer — and yet there are benchmarks worth knowing.

This post is what we wish someone had handed us when we first asked it: realistic ranges, the variables that actually matter, and the diagnostic questions that turn a bare velocity number into something useful.

The short answer

Most healthy product engineering teams of 4–6 people running 2-week sprints with a Fibonacci scale (1, 2, 3, 5, 8, 13) settle into a velocity in the 30–60 story-point range, after 4–6 sprints of calibration.

Read that with three caveats:

The unit is not standardised. Team A’s 5 isn’t Team B’s 5. Comparing point totals across teams is noise.
Stable matters more than high. A team consistently shipping 35 ± 4 points is healthier than one swinging between 20 and 70.
Trend matters more than level. A team going from 30 → 35 → 40 over a quarter is doing well at any starting number.

If you came here for “shoot for 50”, that’s the headline. The rest of this post is why that number is almost meaningless without the context around it.

What “good” actually depends on

Five variables dominate the velocity number. Get any one of them wrong in your interpretation and the benchmark falls apart.

1. Sprint length

Velocity scales close-to-linearly with sprint length. A 1-week team running at 18 is roughly equivalent to a 2-week team at 35.

Sprint length	Typical healthy range
1 week	15–30 points
2 weeks	30–60 points
3 weeks	45–90 points
4 weeks	60–120 points

Short sprints have proportionally more standup, retro, and planning overhead, so they’re slightly less than 2× a 1-week team. But it’s close enough to plan with.

2. Team size

Velocity is not linear with team size, because of communication overhead and code review bottlenecks. Doubling a team rarely doubles output — typically 1.6–1.8×.

Team size	Multiplier on a 4-person baseline
2 people	0.55×
4 people	1.0× (baseline)
6 people	1.4×
8 people	1.7×
10 people	1.9×

If you’re already at 8+ people and considering hiring “to ship more”, the marginal velocity increase is small. Splitting into two teams of 4–5 with separate sprints often beats it.

3. Story-point scale

The most common scales:

Fibonacci (1, 2, 3, 5, 8, 13) — by far the most popular. Numbers in this post assume Fibonacci.
Linear (1–10) — teams using it tend to under-distinguish between 3 and 5 (about 30% smaller-feeling stories), so velocity numbers run roughly 25–40% higher than Fibonacci.
T-shirt sizes (XS/S/M/L/XL) translated to points — varies wildly. Don’t compare across teams using different translations.

If you’re being told “30 is good” by a benchmark and you’re on a non-Fibonacci scale, mentally adjust before drawing conclusions.

4. Work mix

Three teams shipping 40 points each can be doing entirely different things:

New product team — mostly greenfield features, low debt, fast cycle time.
Maturing product team — feature work + bug fixing + minor refactoring.
Late-stage team — heavy maintenance, security work, customer-reported bugs.

Velocity holds steady, but the composition changes. Teams report that their velocity in story points often stays flat for years while the team’s “feel” of throughput collapses, because new features become a smaller and smaller share. Track the completion mix, not just the number.

5. Definition of Done

A team that calls a ticket “done” when the PR is merged will look ~30–50% faster than a team that requires staging-tested, documented, and deployed-to-prod. Same humans, same code; different shipping bar.

Before benchmarking against another team, ask what their Definition of Done looks like. Without that, the comparison is meaningless.

The benchmark table

With those caveats noted, here are the ranges most engineering teams converge to. Numbers assume Fibonacci scale, 2-week sprint, full-time team members:

Team size	Healthy velocity range	Caution if below	Reset if above
2	16–30	< 12	> 40
3	22–42	< 16	> 55
4	30–55	< 22	> 75
5	38–65	< 28	> 85
6	42–75	< 32	> 95
7	47–82	< 35	> 105
8	50–88	< 38	> 115

“Caution if below” means investigate — likely culprits are blockers, an unclear sprint goal, or a hidden bottleneck.

“Reset if above” is rarer but real. It usually means estimates have inflated (Goodhart’s law biting), or the Definition of Done has slipped. High velocity is not a goal — it’s a measurement.

Diagnostic questions that beat the benchmark

Whatever number you got, run through these five questions before reacting to it.

“Is our velocity stable?”

Calculate standard deviation across the last 5 sprints. If it’s more than 25% of the mean, you don’t actually have a velocity yet — you have noise. Don’t forecast on it.

Common causes of high variance: unevenly sized stories, dependencies on other teams, recurring outages, high attrition.

“Are estimates calibrated?”

Pick 5 randomly chosen completed stories from the last 2 sprints. For each, ask: “if we did this again, how would we score it now?” If 3+ would change by more than one Fibonacci step, the team’s calibration is off. Recalibrate before trusting any benchmark comparison.

A free tool that can help: our Sprint Health Check takes 60 seconds and surfaces calibration drift among 7 other warning signs.

“What’s our cycle time?”

A team with high velocity but a 10-day median cycle time is shipping in fewer, bigger chunks. That’s risky — slower feedback, longer review queues, more rework. A team with the same velocity and a 3-day cycle time is dramatically healthier.

Velocity is a sprint-level number; cycle time is a flow-level number. Healthy teams watch both.

“Is the work mix what we expect?”

Categorise last sprint’s completed stories: features / bugs / debt / ops. Is the ratio what leadership thinks it is? At healthy product teams, “new feature” work is typically 50–70% of completed points; if it’s drifting toward 30–40% without anyone noticing, the team is slowly getting reabsorbed by maintenance.

“Is velocity being used as a target?”

If anyone above the team is celebrating high-velocity weeks or punishing low ones, you’ve created an inflation pressure. Either the team will start estimating more conservatively, the Definition of Done will subtly relax, or the work that doesn’t fit neatly into stories will start getting hidden. All three are bad.

Velocity is a forecasting tool. It is not an OKR.

What to do this week

Calculate your last 5 sprints’ velocity. The free calculator takes about 30 seconds.
Note the standard deviation. Anything > 25% of the mean → spend the next sprint investigating variability before trusting the average.
Compare against the table above with team size and sprint length filtered. Are you in the healthy range? If not, which of the diagnostic questions points at the gap?
Don’t share velocity numbers across teams unless every team has the same Definition of Done and the same scale. If you must share, share trend lines, not levels.

A note on AI-coded teams

Teams using AI coding assistants (Cursor, Copilot, Claude Code) consistently report velocity uplifts of 15–35% within the first 2–3 sprints, then plateauing. The plateau matters: AI buys you faster typing, but the team’s planning, review, and deploy gates are still the rate-limiting steps for most stories.

If your AI-assisted velocity uplift is over 50%, double-check that your Definition of Done hasn’t quietly slipped — fewer tests, less review, faster merges. The number is going up, but the bar might be going down.

TL;DR

4–6 person teams, 2-week sprints, Fibonacci scale: 30–60 points is the healthy band.
Stability and trend matter more than level.
Sprint length, team size, scale, work mix, and Definition of Done all dominate the number — adjust before comparing.
Don’t turn velocity into a target. It will lie back to you within a quarter.

SprintFlint computes your velocity, capacity, and burndown automatically as you close tickets — no spreadsheet drift, no shared formula links going stale. Free for the first 300 tickets, no card.

Sprint Velocity Benchmarks: What numbers healthy teams actually hit