"We can do that later" - or can we?

Have you observed how projects will usually have their first wave of personnel exits within the first year. Typically multiple key people will have changed in a 5 year span as well.

What follows is that compromises in sustainability done by a singular person will not be paid by the same person and that makes it easier to make these compromises.

More concretely, you can set traps on every step of the way and you might do it because of a lack of understanding on the difference between large and small scale software development.

Why unsustainability is a problem in the first place

Let’s assume one person can create two issues per week. Sounds far fetched?

That means two people can generate four issues per week. That means, 20 people can create 160 issues per month. That sounds already pretty rough, no?

“But surely, the writer of this blog is just being hyperbolic for effect, this doesn’t happen”, I hear you say.

What about we agree that people are better than that, and we have one bug every month, then the situation is fine if your team has 1 person. One person can fix 1 bug a month, right?

What about 20 people?

If you have 20 people, one issue a month per person could be 240 issues per year. That doesn’t sound so bad for 20 people? It certainly doesn’t, as long as you fix them immediately.

Let us again be conservative, and assume bug fixes only have about 1 day of overhead and fixing one requires some triage and takes on average about 3 days of work to fix from various parties. That’s 720 days of work. That’s three persons work for a year.

I’m still being unfair aren’t I? Let’s assume we are better still and issues only take a day to fix. Now we’re only spending 240 days fixing these issues.

In reality, it can be worse than this. A group of 20 people that starts compromising on everything all the time or simply is forced to only focus on feature work can generate anywhere from 5-20 issues per week. Or maybe they just let one thing slip, and that justified the next one also, and so on. This is potentially thousands of issues and thousands of days of work that you’re passing into the future.

Remember, that these numbers above incorrectly supposes, that the ability for a many-headed monster to create complexity is only linear. Most people will testify, that reality is much worse. Think exponential growth.

Instead, depending on the size of the team, the amount of problems that can be created increases non-linearly.

One might even say, it has no upper limit.

How to avoid the disaster

There is no upper limit to how much damage a big team can do given no restraints. It can take no longer than a month to create years of technical debt and your project manager will not be there to pay for those decisions.

Every step needs to be taken to stop this monster from getting out of control.

Here’s the steps:

Begin containing the monster from the moment the project starts and do it every hour of every day. The causes to why software projects fail are typically found in decisions that are made much earlier than right before the failure.
Never compromise on quality improvement work because of deadlines. You can do that in a project with 3 people, but you cannot do that in a project with 20 people. There will not become a time when you undo the damage that 20 people are capable of. There should not be a customer on this earth who pays for that work and there will never be a time in the project where there is money or time to do that.
Be aware, that your decisions will probably be paid by someone else. You’ll leave the project and move on and somebody else will face your decisions - so make them sustainable.
There is no later - there is only now. The broken window theory states, that if a system is initially allowed to fall into disrepair, it tends towards perpetual disrepair. More on broken window theory.
Disallow as much as is reasonably disallowed. Agree on a subset of technological and paradigm choices and as the project keeps going, just keep narrowing these down.
Review all code and make sure it follows the agreed style and avoids anything complex. It’s absolutely fine to agree to use just the good parts of the language and disallow the rest.
Foster a culture of saying something, if something is in disrepair. Nothing destroys a project faster than people who no longer care about breaches in quality.
Rather than: “see it fix it” - try “see it, fix it together”. This is more sustainable for everyone.

Things that work for small teams do not work for big teams - especially as a project manager or someone in charge of priorities and deciding what the team does. By extension this means, that you cannot apply what you know about 3 person teams into 20+ person multi team environments. Or even 8 person teams.
Do not over-triage bugs or quality issues that are called out and try to immediately and without overhead, fix bugs as they occur. Try to include them in other overhead. The overhead alone can be lethal.
Do not, especially in large teams, make bugs a part of the regular scrum or kanban flow of prioritization. Bugs and quality problems must be fixed out of band and there needs to be space to do so. You may address bugs as regular normal work in small teams but it spirals completely out of control in any larger scenarios.
Listen to your devs, when they tell you that something is an unsustainable practice or express worry about quality. Make sure they have space and time to fix those issues immediately. They might not tell you twice and you cannot afford that.
The mindset for everyone needs to be: “Everything should be fixed all the time”. Not: “We shouldn’t fix anything unless told so”. Again, large projects are different.
In large projects, ignoring quality problems or prioritizing features over daily maintenance is gambling on the customer’s business. A possible outcome of that gambling is, that the project hits a brick wall or that it becomes unworkable. It is also possible, that changing it becomes so hard that it cannot be paid. Furthermore, the customer might not have any idea what ramifications saying “we can do it later” has.
Do not create breeding grounds for bugs like:
- Overly atomic ticketing schemes, where people can work in isolation from the big picture.
- Tickets or a culture where tickets act as borders or boundaries for work.
- “You can do that in another ticket”. People have a tendency to move work to other tickets because it doesn’t “fit” into the same one. Avoid overhead instead and be brave to fix everything around your current work.
- “It’s okay not to do it in this PR” or “I’ll do it in another PR”. Avoid the overhead and just do it in one. This kind of mindset breeds bugs and useless work.
- Pressuring people.
- Deadlines.
- A culture of “focus on the ticket only” and demonizing working outside of it.
- Under-specified tickets and planning meetings that consistently produce these. Tickets should follow some checklist that ensures edge cases and failure conditions are explored and upstream and downstream dependencies are identified and fixed as well.

Ultimately, because of the rules of chaos, large projects and large operations work fundamentally differently and one should be careful in applying things that work in small project domains into large project domains. More importantly, you can’t dump work into the eventual future and expect there to exist someone who takes care of it.

Accepting technical debt continuously becomes a problem so large, that being left unchecked it spirals out of control. It is completely possible to create years of technical debt in months that has exactly two ways of being resolved: Somebody pays for it and the project continues to exist or, nobody pays for it and the project ends and the people who had nothing to do with it’s failure end up being blamed.

Lari Tuomisto10. lokakuuta 2022software development, process, sustainability1 Comment