Developing big and complicated systems quickly remains a difficult problem. Big, complicated systems cost a lot more per "function point" than smaller systems, and they famously seem to resist attempts to accelerate them. It often can't be done, even when a big organization is willing to spend unlimited funds to get results. However, the availability of unlimited funds should provide motivation for the entrepreneurial among us. I think these big projects CAN be accelerated, using ideas pioneered in open source projects.
The Mythical Man Month is an analysis by Fred Brooks of the problems involved in truly large software projects, using the example of the effort that IBM started in 1967 to build OS/360, the operating system that would go on to run most of the world's mainframe computers. Brooks noted that "Adding manpower to a late software project makes it later" (Brook's law). He also pointed out that "Nine women can't make a baby in one month" - a reference to the problem with the idea that you can add more people and get productive "man months" out of them.
You have probably read this book. At a recent gathering of 25 CTO's, I asked who had read The Mythical Man Month, and 23 hands went up. It's a great book, but it's dogma now, the conventional wisdom.
The statistics in Software Assessments, Benchmarks and Best Practices put numbers to the scaling effect. Productivity is measured at 15 function points per developer month for software having 100 function points, declining to 5 per developer month for projects having 10,000 function points, at which point 67 people were involved. The author notes that "projects having more than 100,000 function points are more likely to fail than to succeed." Typical staff size on these projects was 690. There lie dragons.
Software development productivity has improved enormously in the past 30 years. Today's developer has fast hardware, dynamic scripting languages, instant access to tools, documentation, and advice over the Internet, and decades of experience in development process to draw on. As a result a contemporary developer able to produce a lot more "function points" per day than a developer could produce 30 years ago, or even 10 years ago at the height of the Internet bubble.
If one person today can do the work of 5 people from days gone by, that in itself ameliorates the problem. For any given project, our staff only needs to be one fifth as big. We see small entrepreneurial teams doing work that big companies did 15 years ago, and bootstrap entrepreneurs producing products that would have required millions in venture funding ten years ago. If you don't want to fail, don't take on big projects. You can wait for them to become small. However, this is an unsatisfying solution, because our software becomes continually more complicated, and over time, the functionality expected by the user inccreases just as fast as our developer productivity. To produce great systems, we must slay the dragons.
I should also note the the scaling problems are actually worse for small projects. For example, when you go from 1 person to 2 people, you get a big decline in developer productivity. I usually estimate that you need two developers working 40 hours per week to replace one developer working 60 hours per week. That is exactly why good developers work such long hours. Improved scaling ratios will have a disproportionate benefit for small teams that need vacations.
I am one of the few development managers who feels free to ignore Brook's law. I will add a lot of developers to any project, if I can afford it. My limits are imposed not by productivity, but by the fact that I have limited funds for my own product development, and that my clients, like me, are cheap entrepreneurial bastards.
Linus Torvalds provides a more famous example of a developer ignoring Brook's law. Although much of Linux is built by a small core of very dedicated developers, there are currently about 1000 kernel contributors. This has produced a steady, rather than a declining, scaling effect. The kernel has consistently grown by about 10% per year.
I think that Brooks is fundamentally wrong about the cause of the scaling problem. He points the finger at communication problems. If you have N people working on a project, you have N squared communication channels, and each person needs to spend more time communicating. Furthermore, if a new person tries to come into a big project, they have to get a lot of information from a lot of different people, so the ramp up time is increased.
Brooks proposes various tactics to reduce the complexity of communication, including having a master architect produce a well-documented architecture and manual, and having designated "tool makers" to supply each team and subteam. Essentially, he's recommending that we translate a network of technical communications into a hierarchy with a lot fewer connections.
If we compare Brooks' recommendations to typical practices for Linux development, there isn't much overlap. Code is accepted hierarchically, moving from "contributors" to "maintainers", but this is the output, not the raw material. It's done to control quality, not to provide information. Tools and architecture ideas are expected to come from a variety of sources.
I think the scaling problem is not a communication problem; it's a dependency problem. It's not necessarily true that when you work on a bigger system, you need to do more communication. However, it is always true that you depend on more things, so you are more frequently waiting for something else to get finished. Take a close look at a development team of 100 people. At any given time, 50 of them are waiting for something from someone else. That effect alone can account for the observed 50% degradation in productivity compared to a six person team where a blocked team member can demand, or build, an immediate fix.
From this point of view, you can see that open source projects have a huge scaling advantage because all code is shared. If someone is waiting for a component, and they are frustrated enough, and talented enough, they can just fix the problem. The code is all shared, anybody can build and fix any component, and the responsibility for critical components can move around. The answer to the dependency problem is less hierarchy, and fewer official tool builders, not more.
If we do grant some truth to the idea that bigger systems require more time from each developer for communication, we can see that are ways that modern developers reduce this time. Most notably, they share code. Code is a relatively crude form of communication compared with deliberate documentation, but it's accurate, and it's immediately available.
Furthermore, Internet projects tend to communicate in writing, on tickets, blogs, mailing lists, and wikis that are accessible to all team members. This changes the communication from a network of conversations that take time from a lot of people, to a simple text search that a team member can do individually. Perhaps this is why the time spent on conference calls is, in my observation, a good indicator of management problems.
So, what are the takeways for me?
- It is possible to be productive in big groups, and I think it could be a lucrative area of study.
- It is important to share code, and very important to accept new components and fixes not only from designated developers, but from the users of that code.
- It is important to share ideas in writing.
- Control quality by being hierarchical and rigorous about how you test and accept changes, not how you generate them.