Sometimes you have to break the rules

Posted on: 31 October 2022
By: David Hersey

As I mentioned in a previous post, Agile was started by a widespread group of individuals who saw the way things were and weren't working and began to creatively break the rules to get better results.

Of course, Agile has grown in the decades since. It’s started to resemble a detailed set of rules for achieving consistent practices and processes across increasingly large sections of organizations. So now it's not uncommon to hear people telling teams or individuals that what you're doing is not following the rules of Agile and in many cases, that's a bad thing. There is value in having consistency. And a common language and intersecting processes when you try to bring Agile to a larger group such as a Team of Teams.

But what I want to write about here is how to strike a balance between consistency and coordination and remove the wastes that accumulate from people following rules in situations where those rules don't necessarily fit. A major practice in Agile is continuous inspection and adaptation.

The plan-do-check or Kaizen cycles that come from lean manufacturing inspired the Retrospective, which offers an opportunity for a team to, once a sprint, decide how they will look at the results they are accomplishing together and brainstorm ways to adapt the agile framework to the specific needs and opportunities that exist at that team level in order to achieve better results with less waste.

Breaking the rules at the Team of Teams level

A few years ago, I was coaching an organization of roughly 60 individuals organized into a team of teams. We were not using SAFe; this was not an Agile Release Train, but it was a group of nine cross-functional teams of roughly five to seven people each. Each was aligned to a different facet of a common product area, and the product was settling equities transactions at one of the most active centers of the world's financial markets. There were at least 35 separate software systems, having lifetimes that began anywhere from the late 1970s to the modern day. They were written in at least two dozen technologies, including some of our old friends from the client server days of the ‘90s, mainframe code and various flavors of modern main languages such as C++ , Java Script, Java, etc.

As you can imagine, supporting and changing each of these systems required both specialized technical skills and solid knowledge of the business operations behind them. That knowledge, predictably, was accumulated in the minds of some of the more senior engineers on staff. So far so good right? On the surface, it seemed that organizing this large department into a group of smaller teams and aligning each one to different business areas made sense. There were too many systems to align them to systems areas. There would not have been enough people to do the work. So instead teams were aligned to the major customer areas. And as a result, the work that came up in the backlog would involve one or more of these different systems that the group collectively supported. The features were shirtsized, placed in an overall backlog and then broken down and picked up by the individual teams based on capacity and priority.

A tower of Babel?

As we got rolling, we noticed some things. Team velocity would vary greatly. Even though all of the engineers were talented, and it seemed as though the agile practices were being followed, the actual work produced by a given team would fluctuate significantly sprint to sprint or month to month.

The first place we looked was to assess how well the teams were following the agile practices. They seemed to be all following “the rules” correctly. It didn't look like more process would solve the problem. In fact, when we tried, it made things worse in many cases. Customers were also frustrated because the work was not predictable. And they had been used to this department in its pre-agile state having greater reliability, if not greater speed.

At this point, we took a step back and we looked at the complexity of what was going on. If you view the entire department as a unit, you had roughly 60 people maintaining over 35 systems written in two dozen technologies serving different customer groups. The level of complexity of the work that could come through and the variety of systems each request would touch are not uncommon in the corporate world but it is definitely outside the boundaries of what was envisioned when many of the Agile frameworks were created. The early frameworks tended to focus on greenfield development work, not complex enhancements to a diverse group of critical systems, and these roots remain.

Finding the waste

The next thing we did was to find out where the waste was in the system; the root causes of the blocks that kept accumulating. We had every individual in the entire department keep track, for 5 days, on a little piece of paper, the number of times they encountered one or more of the most common Lean Wastes: waiting, task switching, handoffs, and unnecessary work. This did not take much effort; it was easy to tell when you are encountering one because necessarily you would need to put down or delay your work because there was something stopping you. We also asked them to quickly note “why”.

Then we tallied everything up and it became clear with just a few days of logging that the dominant reasons were either 1) being interrupted to answer questions or 2) needing to wait or task-switch due to lacking key information. The people being interrupted were the people who had the information and the people waiting for information were the ones who were trying to do a task but didn't have the technical or business or operational knowledge needed to input to carry it out by themselves.

Furthermore, it became clear that those who had the knowledge were the more senior engineers who had naturally taken on the most significant parts of a particular requirement. Since they were also the ones being interrupted the most, their work was lagging behind the team, and everybody was waiting for them. We quickly realized what we had was a knowledge constraint across the entire department, one that was complex and unpredictable.

Rule-breaking round 1

Armed with this information, we did a couple of things that cut right against the grain. The first thing we did was to reclassify the roles of all of the senior people who had the knowledge – they were no longer to take primary engineering responsibility. Instead, their role became Teacher. Everybody else in the department was asked to identify two to three areas of either business or technical knowledge that they wanted to learn. Now when a team picked up work, the Learners were to take primary responsibility for doing the work, supported by the Teachers in either a mentoring or pair-programming capacity.

This immediately slowed things down, as you might expect. But in the ensuing three months, things got a lot faster as the learners and teachers began to work together. And by the end of the first quarter we had doubled the overall departmental throughput. How did we do it? We broke the rules.

Rule-breaking round 2

Changing people's roles like we did was arguably a minor tweak and is probably acceptable within most Agile frameworks as something that teams may decide to do based on their inspect-adapt cycle. But we still had a problem with capacity. Teams were not evenly loaded, and it was changing week to week, month to month. This was because the customer needs were coming in, in business priority order, not aligned to teams’ capacity. There was no guarantee that the most important requirement would have a team available with the skills to implement it. So then we decided to try an experiment:

What if there were no fixed teams? What if instead, we looked at the entire department as a flexible pool of skills and knowledge which could be dynamically allocated to emerging requirements. What would we give up, and what would we gain? Would we still be agile? The question was asked and we thought yes, we would be. And we further understood that there were three major kinds of needs coming through from the business. Some were smaller, shorter needs that one or two engineers can handle in a period of less than two weeks. Others were more substantial. They might involve different systems and they could be turned around in one to two months. And then finally there were a few larger projects, which might span many months.

Rocks in a box

Have you ever thought about the best way to load up a box full of rocks? When I was in computer science school, we had this problem posed to us by a professor. And it turns out there's an algorithm for it. If you have a bunch of rocks of various sizes, and a box with just enough volume to hold them, then theoretically the rocks should fit right? You just throw them in the box and you're done. Well, I don't know if you've ever tried that, or if you’ve tried to get a package full of electronic components back into its box so you can return it to the retailer. It isn't that easy. If you do the obvious thing; if you just put them in willy-nilly, there's going to be rocks sticking out the top. It turns out what you have to do is put the big rocks in first and then fill in with all the little rocks. If the volume is used efficiently, the rocks fit in the box.

Applying this, we quickly classified the business needs into small, large and really large. We would plot them out in time on a big board then we would dynamically form teams from the flexible pool of differently skilled resources to address them, based on capacity.

For larger projects, teams would co-locate in dedicated team areas, with a Scrum master for the duration of the work. They would work together for 2 - 6 sprints then they would re-form and go back to their home area once the work was accepted by the customer. As each team was in its final sprint, each member was asked to indicate availability and skills using a big Google sheet, to help discover where they would go next. They would then dynamically form another pickup team to do the next thing, perhaps with the same people and perhaps with different people.

Smaller requests would be handled by one or two people working together for 1-5 days, based on available skills and knowledge not consumed by the dynamic teams. After some shaky confusion, the whole system began to flow a lot more smoothly. We doubled throughput a second time.

What have we done?

What we did here was selectively break some of the constraints that the canonical Agile framework had placed upon this department. First, we constrained the roles of certain team members in the interest of widening the knowledge bottleneck. Then we inspected. Did it help? Yes, throughput doubled after three months. Check. Next, we dissolved the concept of durable teams to turn the department into effectively a flexible resource pool, with short-term dynamic teams allocated on a capacity basis to business priorities. We began to pre-plan and pre-assign things in the backlog 2-6 weeks into the future to align capacity to priority, thereby reducing the waste of waiting for capacity. Even though we broke a lot of Agile rules, each of these teams followed solid agile practices whether it was just one or two people working together or a larger group for a longer period of time. They paired, continuously tested, worked to completion. They had strong definitions of ready and done. The teams were cross-functional and in every increment they were producing potentially shippable product. And the result of our continuous inquiry into which rules to keep and which to break resulted in quadrupling throughput over a six-month period.

Sometimes, you just have to break the rules.

Search form