Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Story points aren't time (as OP states). They're relative complexity, and uncertainty (hence the fibbonacci sequence building uncertainty in to larger numbers). And stories should be able to sized as big numbers. I've never been on a team comfortable with more than a 7, at least not since my first agile experience where we all took an agile/scrum training together for a few days. We'd frequently give things like 21 or 30 or 50 points, as appropriate. That's the only place I've ever seen a burndown chart that looked like it should. Everywhere else, it's flat until the last day and then drops to zero as all those "it's a 7 I promise" get carried over to the next sprint for the 3rd time.


I haven't done story points estimating in years, but at the time, an 8 was rarely acceptable, a 13 surely was not. Estimates that high were basically saying "this story is too big or too poorly defined to estimate accurately" and we'd try to break it down into several stories of 5 points or less.

The vast majority of our stories were 2, 3, or 5 points.


Now you are doing exactly the same mistake the article starts out with. Comparing story points outside your own team, of which you have zero context about how much 8 point represents.

How big your points are make no sense at all outside your own team. It is a relative measurement. 8 could mean 8 lines of code, 8 rest endpoints, 8 database columns or 8 interviews with customers. It certainly should not mean 8 days.


I guess I am, but the point I was trying to make is that there was a fairly small number of point values that we were fairly confident in, and above that it was quickly into the "we can't even take a guess" territory.

Correct that the absolute point values aren't relevant, but it would seem odd to me with Fibonacci increments that teams would find values such as 13, 21, or higher to really be useful unless they put a lot of research into their number. For us, it was "read the story card, and give your estimate" so it was entirely a gut feel sort of thing.

And yes, when you really boiled it down (though rarely admitted), for most people 1 point = 1 day. So anything over 5 was unlikely to get done in a week, therefore it needed to be broken down as we ran one week sprints.

I'm not endorsing any of that, by the way. I thought planning poker was pretty arbitrary, but it was the gospel and not to be questioned.


Not really.

The point there is the granularity. If 8 points to you is fixing a minor spelling mistake in your docs, what value is there in having anything smaller than 8?

If 1 is "build the entire backend" then how can you represent anything smaller?


You're just proving their point


I agree - just countless times I have to beat that up into peoples head and even people who I would consider smart "points <> time".

After the sprint you can kind of infer the time but it should not be guideline for next estimations unless these are tasks like "fix typos".


Ultimately, we are bound by time, not complexity. Why does it matter how complex a task is? The product managers and customers wont care how hard we as engineers have to think or reason about a problem; to them, the only thing that matters is time until delivery.


So they are bad product managers and customers.

Time until delivery for good managers and customers is a range. Can you estimate getting 10kg of potatoes from grocery store that is 35m driving roundtrip away? Can you say it will be exactly 40mins because you can pick up and pay in 5 mins? I don't, I can say it will take between 40mins and 2h. There are always things like card terminal stops working or you get stuck in traffic because of an accident.

Complexity in that example is uncertainty like I do expect high traffic and there might be an accident happening but if there is less traffic and I hit all green lights 40mins going to be easy.

We all know bad managers and bad customers will expect me to get that bag of potatoes in 37 minutes and then ask 10x why did I not drove over that police officer that was stopping the traffic because of an accident to get their potatoes on time.


I don't follow how we go from "they are bad product managers and customers" to... therefor time estimates are bad. I do not think it is unreasonable for our primary stakeholders to ultimately care about time. I also do not think it is unreasonable to give error bars in estimates like "this project is uncertain, therefor estimates will be variable".


The bad managers are those who ask you for an estimate and then take it as a commitment.

.

"How much time will this take?"

"Not sure; approximately 30 minutes, but could also be 20 or 40 minutes. If we are very lucky then 10 minutes, but if we are very unlucky, maybe an hour or more."

"Spare me the details, I just need one number for the report."

"Uhm, 40 minutes?"

"You just said that it would be approximately 30 minutes, didn't you?"

"Yeah, but I wanted to add some safety margin..."

"If we keep adding large safety margins to everything, then the project will take forever. As you said, some tasks are completed faster, some tasks are completed slower, on average it will cancel out. I need your best estimate."

"Uh, okay, then 30 minutes. On average."

...the next week...

"So, you guys told me this would take 30 minutes, but it actually took 35. I think we need to have a serious talk about your performance."


"Time until delivery for good managers and customers is a range."

Sometimes it's not. In the gaming industry Christmas is a hard deadline.


No, it's still a range, the difference is simply that a good manager would plan such that Christmas is at the very far end of the range. Bad managers will plan with the optimistic end of the range, and then expect crunch time from exploited workers following their passion.


Story points aren't useful outside the team. They're for the team to help it figure out roughly how much stuff it can do each sprint. They shouldn't leak out of the team.


A sprint is a unit of time, how does measuring a "complexity"- whatever that is- helps in figuring out how much stuff you can put in *time*?


I understand what you're saying - of course in some sense they're convertable. But the point is to not think about time when estimating, because if you estimate time you don't factor in things like other tasks, or holiday, or anything else. Or if you do you have to spend ages trying to account perfectly for time.

Instead, if you estimate complexity (e.g. I think this task is a 3, just as a starting point, then this task is roughly the same, so it's also a 3, then this one is similar but will take almost as much testing due to its difficulty, so we'll call it a 5, then this one is very simple, not even half as difficult as the first one, so it's a 1, etc), then try and keep that up for a few sprints, then figure out how many points fit into a sprint, you automatically factor in other factors (like "I have to log into Okta ten times a day", or "people keep getting pulled into meetings") through practical observation of what got done, and you get better at predicting what you'll be able to achieve in a sprint.

It's not perfect; it just removes the need for certain entire jobs devoted to accounting for time, which you can spend on another developer instead, while also being a reasonable measure of what you'll get done, and only takes about an hour every two weeks.


If the problem is "we're not accounting for holidays in our time estimates", I can't see how the solution could possibly be "time is obviously a flawed measure, so we'll use this other measure which has this hazy relationship with time, but we all agree that it's definitely not time, although we have trouble saying what it is"


personally, I find it much easier to say 'normally, I would get this to you by wednesday of next week, but we have that offsite and my wife's parents are visiting, so does friday work for you?'. than 'this is a 3', so, I guess this fits in this sprint?

changing units and names of things really seems like a deliberate attempt to rob the discussion of any actual meaning. just a comfortable empty formalism that masks the fact that we aren't trying to come to grips with the most difficult parts of our job


But you don't estimate holidays in ... you estimate how long it would take if someone picks on the task Monday morning and works on it full time.

If someone picks up task on Monday then has 20 other meetings - estimation is still the same, he just continues after those 20 meetings and you just don't care when estimating.

Only thing is if at the end of sprint dude is saying "I started X then I had 20 meetings so I did not make it" - well you just accept that or you don't put guy into 20 meetings.


> But you don't estimate holidays in ... you estimate how long it would take if someone picks on the task Monday morning and works on it full time.

You estimate tasks that way, but you estimate capacity to do tasks on your actual track record, which will include holidays and other things.


Functionally you're just created extra steps and confusion by not calling it a time estimate, or at least something equivalent to time. Even with a real time estimate you shouldn't be planning by going "well you work 80 hours this sprint, so we plan 80 hours" - you should be doing the same consideration of looking at how many "hours" were completed in the last few sprints and plan based on that number. If we're in agreement that these numbers are for the team only then it shouldn't matter if they consistently under or over estimate the hours, nobody outside the team should know or care how many "hours" they complete in a sprint.

The confusion part is that by calling it "complexity" and saying it's not a time estimate you've muddied the waters on what it is, people will debate the definition and intentionally differentiate it from actual time. I've seen this before, the points-per-sprint never stabilizes because teams have cards where "that's a 1 point card because it's simple, but it will probably take a week". And then suddenly they're ignoring the points during planning to instead come up with an actual time estimates (which also don't work because they don't track those against multiple sprints).


I think time variability increases with the level of complexity. In this context I see the idea of task complexity being related to uncertainty in the time estimate. This makes it fit nicely with the Fibonacci sequence.


Time variability also increases with the time estimate for a task. If a task is "about two weeks", then it might be 1.5 weeks or it might be 4 weeks.

But if a task is about 1 day, it may take 4 hours or 4 days, but it will almost certainly not be 1 month.

Points are always just a proxy for time, and work the same way. Not matter what anyone claims, as long as you use points to plan time-abound sprints, points are directly a measure of time.


Yep, that's completely accurate.

At the same time, that's one of the reasons to prioritize removing as much uncertainty as possible.


Points are about uncertainty, they're the difference between:

- 3-4 weeks

- 3-9 weeks

Product managers can get their head around that.


Yes, project managers easily can have that level of comprehension, but it is rare to meet a project manager that understands that a time range is something like a confidence interval. That is, if it is estimated a task will take 3-9 weeks, with some probability (say like 10%) it will take an even shorter or longer amount of time. There is uncertainty encoded in the time range, but the time range itself is also uncertain.

Fundamentally, the problem is that project managers set deadlines based on statistical estimates from developers. Despite the fact that they set the deadline and do not understand the dispersion, they want developers to be responsible for misses. Sometimes, people mistankenly believe that there is some magical practice that can eliminate the uncertainty from estimation. You can make predictions with things like story points and achieve a certain amount of accuracy with a certain amount of dispersion. Statistically, it is the longitudinal behavior that can be predicted (sprint success rate at a specific velocity on a stable team), but we focus on cross sectional details (we missed this sprint).

Project management is generally not considered a field requiring statistical expertise but modeling reality of the work requires it.


Why not just say that then? It’ll take 3-9 weeks. You can then just add all the min and max and get a full range.


No, points mash together size and uncertainty. A task that's "3-9 days" will have fewer points than a task that's "3-9 weeks". And a real that's "3 months give or take a week" will have more points than either.

Of course, there's actually no such thing as a "3 months give or take a week" estimate for a task. It's basically impossible in programming to have a task that takes that long with that low a level of uncertainty. So in reality, time estimates have the same properties as points: the higher a time estimate, the more uncertainty it represents.


If they're not time, why use numbers? Use fruits: easy peasy, it's a lemon. A really tough story, a coconut. You can't add them in any case, because they're not time.


I like this, I'd advocate for a fruit-based task system - although I suppose the exact fruit ranking would depend on the team.

- easy peasy: lemon

- easy but needs careful handling: kiwi

- regular but boring: red delicious

- regular, who wouldn't want to take one of these?: mango

- large task, risk of splash damage if mishandled: watermelon

- tough to crack, needs time or a hammer: coconut

- technically we'll do this, but not really our job: tomato

Edit: I am sad that emojis aren't allowed in comments, though it's understandable.


Just because they're not time, doesn't mean you don't want to add them. Imagine you have an empty basket, and you're not sure how much fruit you can toss in it. So the first time, you just start tossing stuff in until it's full. The exact number of lemons, coconuts, etc will vary. But after a few rounds, you'll get a feel for how much of each you can get into the basket. That's story points. You feel your way into a groove where the team gets a more concrete sense about how much it can get done in a sprint, given the variability of the work, external projects/distractions, and the makeup of the team.

Story points get a bad rap because a lot of engineering managers don't get scrum and just see a convenient way to measure productivity. Which story points are absolutely not meant to do, outside of the team itself setting its own sprint goals.


My experience is that story points get a bad rep because they don't mean anything unless you use them as an explicit proxy for time. There's no way to say in reality "this task is small" unless you have some idea of how long it takes. Additonally, this concept of velocity makes no sense because a task that's big for me might be small for someone else in the team, so then we either pre-assign tasks and set points based on assignment (and then have problems if we switch the assignee for any reason), or we assign "generic points" and that ends up not meaning anything at all if the team is not very uniform (e.g. all seniors with similar skills and ownership of most of the same code, or all juniors).

Additonally, all methodologies tend to discourage correcting point values after the fact. That makes the process of deriving time estimates (velocity) even more error prone, because it conflates uncertainty with mistakes. That is, you can correctly estimate a task at 8 points and finish it in four weeks; or you can incorrectly estimate a task as 3 points and finish it in 4 weeks. That doesn't mean that the team has a velocity of about 1.35 points/week, it means it has a velocity of about 2 points/week, but made a mistake with one task.


>My experience is that story points get a bad rep because they don't mean anything unless you use them as an explicit proxy for time.

So when you shop for clothes, Small/Medium/Large are useless? You require precise measurements for every item, and they have to be exactly the same size across manufacturers, or else sizes have no utility for you? The reality is that a Large can be large on you in different ways, even if it's a t-shirt. And software complexity has a lot more dimensions than a t-shirt. The utility of story points is that they allow a team to create a rough idea of their capacity over a sprint, so that they don't consistently under- (or more commonly) over-commit.

If you try to use story points purely as a uniform proxy for time, of course they're going to be useless, because you can always just use time instead.


Of course small/medium/large mean something, they are an approximation of size/dimensions. But story points, adherents claim, are not a measure of time at all! They are not "an approximation of time", they are, so it is claimed, supposed to be unrelated to time, but to "complexity".

And while I agree that a task can be large either because you know what must be done and there is a lot of work or because you're not sure what needs to be done yet. But conflating those two things as "8 points" or whatever is just not helpful.


Story points are also "an approximation of size/dimensions." If my team has consistently deployed 25-35 story points per sprint for the last three sprints, it's reasonable for me to assume that next sprint they will also be able to complete about 30 story points of work. By contrast, knowing that they worked a combined total of 300 hours on average doesn't help me at all. And accounting for uncertainty is important, which is one reason a Fibonacci sequence is commonly used. The general rule is to go up one in the sequence if the team is uncertain. The whole purpose of story points is to avoid having to track things like uncertainty separately. It's like the Markov assumption, the information to get to the current point estimate is baked in. It is useful (essential, even) to incorporate fuzzy concepts like perceived complexity or uncertainty without bogging the team down trying to measure them precisely and explicitly.


> If my team has consistently deployed 25-35 story points per sprint for the last three sprints, it's reasonable for me to assume that next sprint they will also be able to complete about 30 story points of work.

And if the last few sprints they've completed between 5 and 30 points, do you believe they'll complete around 17.5 points next sprint?

Now, if the team is good at estimating (which they are if they get consistent results between sprints), they can tell you of them telling you feature X is 8 points, Y is 5 points, Z is 15 points, and you concluding that they will finish X, Y, and Z next sprint. But, they can exactly as well tell you that X will take around 3 days, Y will take around 2 days, and Z will take around 5 days, and you can have the same conclusion.


>And if the last few sprints they've completed between 5 and 30 points, do you believe they'll complete around 17.5 points next sprint?

I don't know, what did the team figure out in retro? Was the big difference a real underestimation, or was there some kind of unforeseen blocker? I've never seen that big a variation, but anything's possible.

If it makes you feel better to measure team velocity in something you call "days" instead of story points and it works for your team, more power to you. But don't fool yourself that you're talking about actual days. At best you're talking about "probable days", and how many days it actually takes will depend on a lot of things, including unknowns and who takes the story (are "Bob-days" the same as "Carol-days"?). So you'll end up with a measure of days that is very team- and uncertainty-dependent, and at that point it's better to just use story points and admit that it's not a universal measure and doesn't need to be. Not to mention that by using days you'll invite confusion between calendar time and time-on-task.


If you try this (and I have, just not with fruits), someone will complain they can't graph fruits. You'll tell them that's the point. They won't listen, so they'll map fruits to numbers, and now you have the same problem anyway.

My personal preference is to use time estimates with some uncertainty. A day or less. 2-3 days. A week at most.


> My personal preference is to use time estimates with some uncertainty. A day or less. 2-3 days. A week at most.

In the project management world, there is an assumption that tasks that are overestimated and underestimated would even themselves out so that the total estimate would equal the actual time needed. Sad to say that accuracy of estimates don't follow normal distribution in software development.


I had one manager who used time in orders of magnitude. He’d ask, “is it a day, a week, a month, or a year?”


I've done exactly this in the past, and then when someone asks how long the project as a whole is going to take it's easy enough to give a range. If someone says a task is going to take hours that's a range of 1-6 hours, if they say it's months that's 1-12 months. If you want more certainty in your estimate then you're going to have to give us some time to break things down.


That is one of the complications - one thinks developers should be smart as in abstract thinking so they should understand (just like all the other numbers humanity made up): "numbers we call story points are not having property to add them and they are not convertible to time".

*Properties of Whole Numbers:

    Whole numbers are closed under addition and multiplication.
    Zero is the additive identity element of the whole numbers.
    1 is the multiplicative identity element.
    It obeys the commutative and associative property of addition and multiplication.
    It satisfies the distributive property of multiplication over addition and vice versa.
*


Why use a measure that creates this footgun? Why should our task estimation require this much abstract thinking? Why not invent the measure so that it's not misleading? Numbers generally have well-understood properties. Using them in a way where the properties don't apply is asking to be misunderstood.


And this is how the entire math gets done in 5 simple JIRA tasks. :D


I use Halo difficulty levels: Easy, Normal, Heroic, Legendary.


I continue to treat story points as a measure of time, despite being told repeatedly they're definitely not time. I will continue doing this until someone can explain to me, in a way I can understand, what they actually are that is not time.


Because in the end, they are a proxy for time. We can call them "complexity" or whatever, but that doesn't help much with planning a time-boxed period of activity. So they end up meaning "time".


Having points not be a measure of time is a means of estimating how much work you think the team as a whole will accomplish while diminishing the risk that a given estimate (delivered in time range) will mutate into a 'promise'.

Its also a good way of communicating what you think the blend of known unknowns and unknown unknowns is.


Difficulty level?

You can't promise you can beat a game on hard 2x as fast as you can on normal, or 3x as easy.


Estimates never represented a promise in the first place. If you have someone who is holding you to your estimates, you have to address that.

Regarding difficulty, easy things aren't even expected to be faster than hard things. I'd rate a backflip as much harder than counting to 100,000, even though it wouldn't take nearly as long.


The identity of story points depends on what information you have. If you don't know your team's velocity then story points are only relative complexity. Once you have your team's velocity you can use that information to convert to time.


What is relative complexity? How can you compare the complexity of <changing the colors of one button> with <implementing a sorting algorithm>, other than by how long they might take?


Then it would be how long they might take relative to each other. To give a specific time you would have to know the stage of development for the product, the health and ease-of-use of the CI/CD pipeline, the current interviewing burden of the team, the rate at which production incidents are occurring, the number and length of meetings that the team members are in, etc., etc., etc. More junior developers generally won't do a great job of that. But more junior developers can usually estimate the time it would take them to do the task if they had absolutely nothing else to do and if their build/test/deploy pipeline were optimal. So with story points that's all they need to really worry about. In that way story points are a tool to help the team make better estimates in the face of varying degrees of expertise in individual contributors estimating time-to-complete.

By judging complexity and measuring velocity you get an estimate of time that intrinsically takes all of the variables into account. It's a powerful tool when used right.


> To give a specific time you would have to know the stage of development for the product, the health and ease-of-use of the CI/CD pipeline, the current interviewing burden of the team, the rate at which production incidents are occurring, the number and length of meetings that the team members are in, etc., etc., etc.

This is a strawman. When asked to estimate a task, people essentially always think in terms of "how long would it take if this were the only thing I was working on". Of course, when a junior dev gives an estimate like this, you don't put it into a Gantt chart and start planning release celebrations based on it: you add appropriate buffers and uncertainty based on who made the estimate.

> By judging complexity and measuring velocity you get an estimate of time that intrinsically takes all of the variables into account. It's a powerful tool when used right.

Again I ask, what is complexity, other than an estimate of time taken?

Also, "velocity" is just an average across people and sprints. This would only converge to a meaningful estimate of time IF people are consistently failing their estimates in the same way. If the error bar on the estimates varies wildly, taking the average doesn't do anything meaningful. I think this is much more common than consistently miss-estimating in the same way.

Not to mention, if these estimates of "complexity" don't take into account external factors, then they'll always be off by unpredictable amounts. Velocity measurements also fail to take this into account - so, when the team had a bad sprint because a member fell ill, or because there were extended disk failures, or whatever other external event, that goes into the velocity, as if this is some recurring event.


I did story points with a team of 12 over a two-year period of time, and it worked.


I did them too, with two different teams of 8-10 people, once for two years and the second time for about one year. We didn't lose anything when we gave up on points and simply went for time based estimates.


Points are almost worthless as they can be gamed.

We're judged on delivery. Measured by time. Complexity is arbitrary.

If points aren't time bound why am I limited on the amount of points I can take? Every team had a max point load. If there's no stick for rollovers then you're Kanban.


> Story points aren't time...

> ...burndown chart...

The x-axis of a burndown chart is time, right? So if you create a chart that measures points/time then you encourage the idea that a certain number of points can/should be completed in a day, ergo that points are a proxy for units of time. Otherwise what's the point in the chart?


If you're measuring foo/time, then foo is probably not time. Unless you're measuring some kind of acceleration.

Charts are supposed to be pretty and reassuring and go up and to the right. That keeps the managers happy!


Yeah, it seems like it's fairly common for people/teams to follow the idea that any story that is 8 or more points should be broken down to tasks of 5 or less. This simply doesn't make sense to me. If the most simple task is 1 point, is your most complex task allowed really only 5 times as complex? Story points usually follow an exponential increase for a reason, enforcing staying in the mostly linear portion is just pretending the complexity and uncertainty has been decreased.


The idea is that if a task is that large can you really not break it down into smaller steps? Do we understand the problem well enough to implement or are we hand waving over likely areas of complexity? Maybe if you tried to break it down you'd realise that the 21 point card is actually more like 10+ 5 points tasks and you had just though "big" not "I know what needs to be done and can size this accurately".

Doesn't mean these cases never occur but it's worth seeing if it's actually smaller related pieces of work.


> Maybe if you tried to break it down you'd realise that the 21 point card is actually more like 10+ 5 points tasks

It didn't even occur to me think of it this way, because the times I've been exposed to breaking down tasks the total number of points stayed constant. 13 pointers becoming an 8 and a 5, and the 8 pointer becoming a 5 and a 3.


> Do we understand the problem well enough to implement or are we hand waving over likely areas of complexity?

Nailed it. That is exactly the right question to ask.


The use of the Fibonacci sequence is so pseudo-intellectual. It's completely arbitrary, but use of the Fibonacci sequence makes it sound smarter or justified somehow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: