In Memoriam: Frederick P. Brooks, Jr. and The Mythical Man-Month

It is with some sadness that I have read that Fred Brooks has passed away. Brooks (1931 - 2022) worked at IBM and managed a large team developing the IBM System/360 computers in the early 1960s. Brooks was thirty years old at the start of this project. He founded the Computer Science Department at UNC Chapel Hill in 1964, at the age of thirty-three, acting as its department chair for twenty years. He remained at IBM until 1965, however. During this one-year overlap he managed the team developing the operating system for IBM’s System/360, called — unimaginatively — OS/360. He is probably most remembered, though, for writing The Mythical Man-Month: Essays on Software Engineering, the book with the hostile-looking prehistoric animals on the cover.

Which sounds about right: a book written about software engineering in 1975? Prehistoric and somewhat intimidating.

The cover drawing, spanning both front and back covers, is repeated on page 2, with the following credit:

This article is available in PDF format for easy printing

C. R. Knight, Mural of La Brea Tar Pits
Photography Section of the Natural History Museum of Los Angeles County.

This in itself merits a brief diversion.

The drawing is a line engraving by someone named Willoughby, and was used as cover and front matter illustration for the third edition (September 1946) of a booklet titled Rancho La Brea: A Record of Pleistocene Life in California, by Chester Stock, published by the Los Angeles County Museum of Natural History. (The drawing does not appear in the 1930 first edition or the 1942 revised edition, but has been retained for subsequent editions.)

The mural is allegedly titled The Deathtrap of the Ages and was painted in 1925 by Charles R. Knight for the Los Angeles County Museum of Natural History:

Image placed in the public domain, courtesy of the University of Southern California Libraries and the California Historical Society.

Knight had previously painted a very similar scene with only minor differences (the tree branches, the position of the attacking felines, some of the animals in the background) in December 1921 for a mural at the American Museum of Natural History in New York City. AMNH describes the scene as follows:

Mural, oil painting. Pacific coast (La Brea Tar Pits fauna near Los Angeles), sabre-tooth tiger (smilodon), ground sloth, columbian mammoth, extinct vulture, canis dirus. Sloths stuck in tar pits being attacked by smilodons, vultures watching from trees, wild dogs and mastodons in distance, mountains in background. The mural hung in the Hall of the Age of Man [1921-1966].

Unlike other depictions of the prehistoric — think of dinosaurs plodding around in ancient swamps — which form a sort of mythical collective consciousness of some undisclosed location made unreachable by time and distance, the Tar Pits are a real place! If you ever find yourself in Los Angeles, with nothing to do, I would highly recommend a visit to the La Brea Tar Pits. Los Angeles County has constructed the George C. Page Museum atop this unique site, where paleontological excavation goes on year-round. I made a pilgrimage there in 1996, to see the place I had once envisioned from the pages of a school textbook. Our school district had adopted the Ginn 720 Reading Series, edited by Theodore Clymer, with whimsical titles such as How It Is Nowadays and A Lizard to Start With. My reading group, in second or third grade, used Tell Me How the Sun Rose. On page 302 began the story, TRAGEDY OF THE TAR PITS, by dinosaur fossil-hunter and AMNH museum director Roy Chapman Andrews:

A pastoral scene — with saber-toothed tiger.

Gradually, the story unfolds: one by one, the curious and unlucky creatures go to see what is this stuff, venturing in, only to find that they are sucked down into sticky tar.

Now it has become a horror-story vision: prehistoric animal death trap! The children’s reading circle giggles, preparing us for Edgar Allan Poe stories at some later date.

Even better: it’s a real place, in California!

Once upon a time not too long ago, southern California was home to oilfields, such as the Salt Lake Oil Field — which is the source of the La Brea Tar Pits — and the Beverly Hills Oil Field. Some of the oil is still there, although not as much as back in the very early 1900s. The L.A.-area oil fields fed the growth of Los Angeles from a small town to a sprawling metropolis. The oil fields feature prominently in Raymond Chandler’s 1939 noir novel, The Big Sleep. Oil derricks once loomed apocalyptically over Huntington Beach, but today the derricks are gone, and the wells have been camouflaged or shut down over the years. (“The land has gotten so expensive—they don’t want to tie it up with the oil wells.”)

Huntington Beach, 1926. Photo courtesy Orange County Archives.

As to exactly what inspired Fred Brooks to choose the tar pits as metaphor — a visit to Los Angeles? — we can no longer ask him.

Metaphor is one of the strengths of Mythical Man-Month. Tar pits, cathedrals, the Tower of Babel (of which I have written previously), surgeons, ten pounds in a five-pound sack… this period of IBM’s mainframe heyday must have left an immense impression on Brooks’s psyche. As Brooks states in a 2010 Wired interview:

As I was leaving IBM, Thomas Watson Jr. asked me, “You’ve run the hardware part of the IBM 360, and you’ve run the software part; what’s the difference between running the two?” I told him that was too hard a question for an instant answer but that I would think about it. My answer was The Mythical Man-Month.

In the preface, Brooks refers to the chapters of the book as separable essays:

My own conclusions are embodied in the essays that follow, which are intended for professional programmers, professional managers, and especially professional managers of programmers.

Although written as separable essays, there is a central argument contained especially in Chapters 2-7. Briefly, I believe that large programming projects suffer management problems different in kind from small ones, due to division of labor. I believe the critical need to be the preservation of the conceptual integrity of the product itself. These chapters explore both the difficulties of achieving this unity and methods for doing so. The later chapters explore other aspects of software engineering management.

My advice to anyone reading it is to ignore the supporting details that Brooks cites, which predate the personal computer era (from Chapter 3: “… a structured programming language such as PL/I....”; from Chapter 9: “On a Model 165, memory rents for about \$12 per kilobyte per month.”; from Chapter 12: “If a separate machine is needed, it is a rather peculiar thing—it need not be fast, but it needs at least a million bytes of main storage, a hundred million bytes of on-line disk, and terminals.”), and are about as present today as the giant ground sloths of La Brea and the oil derricks of Huntington Beach. Focus instead on the underlying concepts and metaphors, which are timeless.

To give a flavor of the content I will share a few excerpts.

Chapter 3: The Surgical Team

Here is one of my favorite passages:

The dilemma is a cruel one. For efficiency and conceptual integrity, one prefers a few good minds doing design and construction. Yet for large systems one wants a way to bring considerable manpower to bear, so that the product can make a timely appearance. How can these two needs be reconciled?

Mills’s Proposal

A proposal by Harlan Mills offers a fresh and creative solution. Mills proposes that each segment of a large job be tackled by a team, but that the team be organized like a surgical team rather than a hog-butchering team. That is, instead of each member cutting away on the problem, one does the cutting and the others give him every support that will enhance his effectiveness and productivity.

A little thought shows that this concept meets the desiderata, if it can be made to work. Few minds are involved in design and construction, yet many hands are brought to bear. Can it work? Who are the anesthesiologists and nurses on a programming team, and how is the work divided? Let me freely mix metaphors to suggest how such a team might work if enlarged to include all conceivable support.

The surgeon. Mills calls him a chief programmer. He personally defines the functional and performance specifications, designs the program, codes it, tests it, and writes its documentation. He writes in a structured programming language such as PL/I, and has effective access to a computing system which not only runs his tests but also stores the various versions of his programs, allows easy file updating, and provides text editing for his documentation. He needs great talent, ten years experience, and considerable systems and application knowledge, whether in applied mathematics, business data handling, or whatever.

The copilot. He is the alter ego of the surgeon, able to do any part of the job, but is less experienced. His main function is to share in the design as a thinker, discussant, and evaluator. The surgeon tries ideas on him, but is not bound by his advice. The copilot often represents his team in discussions of function and interface with other teams. He knows all the code intimately. He researches alternative design strategies. He obviously serves as insurance against disaster to the surgeon. He may even write code, but he is not responsible for any part of the code.

The administrator. The surgeon is boss, and he must have the last word on personnel, raises, space, and so on, but he must spend almost none of his time on these matters. Thus he needs a professional administrator who handles money, people, space, and machines, and who interfaces with the administrative machinery of the rest of the organization. Baker suggests that the administrator has a full-time job only if the project has substantial legal, contractual, reporting, or financial requirements because of the user-producer relationship. Otherwise, one administrator can serve two teams.

The editor. The surgeon is responsible for generating the documentation—for maximum clarity he must write it. This is true of both external and internal descriptions. The editor, however, takes the draft or dictated manuscript produced by the surgeon and criticizes it, reworks it, provides it with references and bibliography, nurses it through several versions, and oversees the mechanics of production.

Two secretaries. The administrator and the editor will each need a secretary; the administrator’s secretary will handle project correspondence and non-product files.

The program clerk. He is responsible for maintaining all the technical records of the team in a programming-product library. The clerk is trained as a secretary and has responsibility for both machine-readable and human-readable files.

All computer input goes to the clerk, who logs and keys it if required. The output listings go back to him to be filed and indexed. The most recent runs of any model are kept in a status notebook; all previous ones are filed in a chronological archive.

Absolutely vital to Mills’s concept is the transformation of programming “from private art to public practice” by making all the computer runs visible to all team members and identifying all programs and data as team property, not private property.

The specialized function of the program clerk relieves programmers of clerical chores, systematizes and ensures proper performance of those oft-neglected chores, and enhances the team’s most valuable asset—its work-product. Clearly the concept as set forth above assumes batch runs. When interactive terminals are used, particularly those with no hard-copy output, the program clerk’s functions do not diminish, but they change. Now he logs all updates of team program copies from private working copies, still handles all batch runs, and uses his own interactive facility to control the integrity and availability of the growing product.

The toolsmith. File-editing, text-editing, and interactive debugging services are now readily available, so that a team will rarely need its own machine and machine-operating crew. But these services must be available with unquestionably satisfactory response and reliability; and the surgeon must be sole judge of the adequacy of the service available to him. He needs a toolsmith, responsible for ensuring this adequacy of the basic service and for constructing, maintaining, and upgrading special tools—mostly interactive computer services—needed by his team. Each team will need its own toolsmith, regardless of the excellence and reliability of any centrally provided service, for his job is to see to the tools needed or wanted by his surgeon, without regard to any other team’s needs. The tool-builder will often construct specialized utilities, catalogued procedures, macro libraries.

The tester. The surgeon will need a bank of suitable test cases for testing pieces of his work as he writes it, and then for testing the whole thing. The tester is therefore both an adversary who devises system test cases from the functional specs, and an assistant who devises test data for the day-by-day debugging. He would also plan testing sequences and set up the scaffolding required for component tests.

The language lawyer. By the time Algol came along, people began to recognize that most computer installations have one or two people who delight in mastery of the intricacies of a programming language. And these experts turn out to be very useful and very widely consulted. The talent here is rather different from that of the surgeon, who is primarily a system designer and who thinks representations. The language lawyer can find a neat and efficient way to use the language to do difficult, obscure, or tricky things. Often he will need to do small studies (two or three days) on good technique. One language lawyer can service two or three surgeons.

This, then, is how 10 people might contribute in well-differentiated and specialized roles on a programming team built on the surgical model.

How It Works

The team just defined meets the desiderata in several ways. Ten people, seven of them professionals, are at work on the problem, but the system is the product of one mind—or at most two, acting uno animo.

Notice in particular the differences between a team of two programmers conventionally organized and the surgeon-copilot team. First, in the conventional team the partners divide the work, and each is responsible for design and implementation of part of the work. In the surgical team, the surgeon and copilot are each cognizant of all of the design and all of the code. This saves the labor of allocating space, disk accesses, etc. It also ensures the conceptual integrity of the work.

Second, in the conventional team the partners are equal, and the inevitable differences of judgment must be talked out or compromised. Since the work and resources are divided, the differences in judgment are confined to overall strategy and interfacing, but they are compounded by differences of interest—e.g., whose space will be used for a buffer. In the surgical team, there are no differences of interest, and differences of judgment are settled by the surgeon unilaterally. These two differences—lack of division of the problem and the superior-subordinate relationship—make it possible for the surgical team to act uno animo.

I remember reading this about twenty years ago, and with stars in my eyes I knew I wanted to work on such a team — of course, I would be the “surgeon”. The possibility never materialized… but I’ve had time to reflect over the years of the suitability of this surgical-team ideal.

First, times have changed, and the need for the “program clerk” has gone away. The availability of personal computers has made software editing, execution, debugging, and documentation much more accessible, and with version control systems and other software tools for organizing documentation, we no longer need a specialized person to organize records. At larger companies a “documentation control” department may be involved. Secretaries have faded in significance, and the title changed to administrative assistant… at my previous employer, the ratio of technical staff to administrative assistants was probably 40:1, and where I now work, it is maybe 100:1 or 150:1.

Second, the role of an administrator — more appropriately today as a “project manager” — is underappreciated in Brooks’s essay. Someone needs to manage interaction with stakeholders and to promote the project within the organization to management so that the project receives support. Taking for granted that a project should continue is a recipe for disaster.

Third — and I think this is the key takeaway — the concept of roles is much more significant than whether those roles are individually assigned to different people. (Brooks goes on to talk more about roles and different organizational structures in Chapter 7.) Once, after working on a frustrating project that was underprepared and understaffed, I wrote a memo about the roles needed to design, implement, and test a successful digitally-controlled motor drive. I think I got to about 10 different roles. But, I emphasized, this didn’t mean that 10 people were needed — only that the roles needed adequate coverage, otherwise the team wasn’t likely to deliver on schedule or even to deliver at all. I see Brooks’s surgical team as a collection of roles, and the individual strengths of team members dictate how to manage task specialization.

The agile software approach of sprints and scrum may mistakenly evoke a vision of a uniform pool of interchangeable software developers, who can each attack user stories much like Brooks’s hog-butchering analogy: work is self-assigned by the next available developer. In one article, Mike Cohn discusses the value of specialists in software development. Specialization is good as long as it doesn’t trap the team as a collection of individuals who each can only work in one way, making the team exposed to the vagaries of one person’s vacation plans, or other unforeseen events.

The important thing here: identify the roles! Do not underestimate the need for certain team functions, and be aware that not all software developers have high abilities in everything, whether it be technical writing, mathematical analysis, organizing of information, discipline of following standards, the making and maintenance of software tools, or the creation of pathological but important test cases.

Chapter 4: Aristocracy, Democracy, and System Design

Here is where Brooks discusses “conceptual integrity” and the difference between architecture and implementation:

Aristocracy and Democracy

Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds.

Schedule pressures, however, dictate that system building needs many hands. Two techniques are available for resolving this dilemma. The first is a careful division of labor between architecture and implementation. The second is the new way of structuring programming implementation teams discussed in the previous chapter.

The separation of architectural effort from implementation is a very powerful way of getting conceptual integrity on very large projects. I myself have seen it used with great success on IBM’s Stretch computer and on the System/360 computer product line. I have seen it fail through lack of application on Operating System/360.

By the architecture of a system, I mean the complete and detailed specification of the user interface. For a computer this is the programming manual. For a compiler it is the language manual. For a control program it is the manuals for the language or languages used to invoke its functions. For the entire system it is the union of the manuals the user must consult to do his entire job.

The architect of a system, like the architect of a building, is the user’s agent. It is his job to bring professional and technical knowledge to bear in the unalloyed interest of the user, as opposed to the interests of the salesman, the fabricator, etc.

Architecture must be carefully distinguished from implementation. As Blaauw has said, “Where architecture tells what happens, implementation tells how it is made to happen.” He gives as a simple example a clock, whose architecture consists of the face, the hands, and the winding knob. When a child has learned this architecture, he can tell time as easily from a wristwatch as from a church tower. The implementation, however, and its realization, describe what goes on inside the case—powering by any of many mechanisms and accuracy control by any of many.

Brooks later emphasizes the importance of keeping the architecture the responsibility of a small group of people, with a cautionary tale:

What Does the Implementer Do While Waiting?

It is a very humbling experience to make a multimillion-dollar mistake, but it is also very memorable. I vividly recall the night we decided how to organize the actual writing of external specifications for OS/360. The manager of architecture, the manager of control program implementation, and I were threshing out the plan, schedule, and division of responsibilities.

The architecture manager had 10 good men. He asserted that they could write the specifications and do it right. It would take ten months, three more than the schedule allowed.

The control program manager had 150 men. He asserted that they could prepare the specifications, with the architecture team coordinating; it would be well-done and practical, and he could do it on schedule. Furthermore, if the architecture team did it, his 150 men would sit twiddling their thumbs for ten months.

To this the architecture manager responded that if I gave the control program team the responsibility, the result would not in fact be on time, but would also be three months late, and of much lower quality. I did, and it was. He was right on both counts. Moreover, the lack of conceptual integrity made the system far more costly to build and change, and I would estimate that it added a year to debugging time.

Many factors, of course, entered into that mistaken decision; but the overwhelming one was schedule time and the appeal of putting all those 150 implementers to work. It is this siren song whose deadly hazards I would now make visible.

These are war stories! The only way to avoid repeating these mistakes with disastrous consequences is to hear from our forebears and make enough of our own little errors that we learn the lessons before the stakes are high.

As for the split between architecture (or “design”) and implementation: I still find it odd how someone can go into a code review with just an implementation — no design — and invite reviewers to approve. How do I know whether the implementation is correct, when I have nothing to judge it against?

Write down your design plans first, before implementing — even if it’s just a few sentences, it helps structure and constrain the work to come.

Chapters 5 (The Second-System Effect) and 11 (Plan to Throw One Away)

Brooks makes the case that the chief architect of any system should have experience on at least two previous efforts:

Self-Discipline—The Second-System Effect

An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.

As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.” For example, consider the IBM 709 architecture, later embodied in the 7090. This is an upgrade, a second system for the very successful and clean 704. The operation set is so rich and profuse that only about half of it was regularly used.

The first work is an experiment, fraught with peril:

Pilot Plants and Scaling Up

Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in only one step. An intermediate step called the pilot plant is necessary to give experience in scaling quantities up and in operating in nonprotective environments. For example, a laboratory process for desalting water will be tested in a pilot plant of 10,000 gallon/day capacity before being used for a 2,000,000 gallon/day community water system.

Programming system builders have also been exposed to this lesson, but it seems to have not yet been learned. Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built.

In most projects, the first system built is barely usable. It maybe too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done. Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time.

The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers. Seen this way, the answer is much clearer. Delivering that throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the redesign, and a bad reputation for the product that the best redesign will find hard to live down. Hence plan to throw one away; you will, anyhow.

This resonates with me. I have seen at least three systems put into production (two for internal tools, one for the general public) that were “${ENGINEER}’s first programming project in ${LANGUAGE}“, which were never redone. The designs all showed great care, but inexperience, and as a result, users were stuck with quirky limitations of the design. Other systems I have seen were the result of similar inexperience at an architectural level. You can’t incrementally “morph” a poor design into something good through agile methods.

Takeaways here are that we need experienced system architects; we must identify areas of architectural risk, and make time to prototype them somehow, or our production designs will be the ungainly prototype.

Chapters 2 (The Mythical Man-Month) and 8 (Calling the Shot)

On schedule estimation, and the tendency to underestimate:

Charles Portman, manager of ICL’s Software Division, Computer Equipment Organization (Northwest) at Manchester, offers another useful personal insight.

He found his programming teams missing schedules by about one-half—each job was taking approximately twice as long as estimated. The estimates were very careful, done by experienced teams estimating man-hours for several hundred subtasks on a PERT chart. When the slippage pattern appeared, he asked them to keep careful daily logs of time usage. These showed that the estimating error could be entirely accounted for by the fact that his teams were only realizing 50 percent of the working week as actual programming and debugging time. Machine downtime, higher-priority short unrelated jobs, meetings, paperwork, company business, sickness, personal time, etc. accounted for the rest. In short, the estimates made an unrealistic assumption about the number of technical work hours per man-year. My own experience quite confirms his conclusion.

A related passage on scheduling:

Systems Test

No parts of the schedule are so thoroughly affected by sequential constraints as component debugging and system test. Furthermore, the time required depends on the number and subtlety of the errors encountered. Theoretically this number should be zero. Because of optimism, we usually expect the number of bugs to be smaller than it turns out to be. Therefore testing is usually the most mis-scheduled part of programming. For some years I have been successfully using the following rule of thumb for scheduling a software task:

⅓ planning

⅙ coding

¼ component test and early system test

¼ system test, all components in hand.

This differs from conventional scheduling in several important ways:

The fraction devoted to planning is larger than normal. Even so, it is barely enough to produce a detailed and solid specification, and not enough to include research or exploration of totally new techniques.

The half of the schedule devoted to debugging of completed code is much larger than normal.

The part that is easy to estimate, i.e., coding, is given only one-sixth of the schedule.

In examining conventionally scheduled projects, I have found that few allowed one-half of the projected schedule for testing, but that most did indeed spend half of the actual schedule for that purpose. Many of these were on schedule until and except in system testing.

Failure to allow enough time for system test, in particular, is peculiarly disastrous. Since the delay comes at the end of the schedule, no one is aware of schedule trouble until almost the delivery date. Bad news, late and without warning, is unsettling to customers and to managers.

Furthermore, delay at this point has unusually severe financial, as well as psychological, repercussions. The project is fully staffed, and cost-per-day is maximum. More seriously, the software is to support other business effort (shipping of computers, operation of new facilities, etc.) and the secondary costs of delaying these are very high, for it is almost time for software shipment.

Indeed, these secondary costs may far outweigh all others. It is therefore very important to allow enough system test time in the original schedule.

Underestimation is so ingrained in our nature as software engineers, that we must be constantly vigilant and aware of potential causes. I have been technical lead on a project at work, that is currently behind schedule, and it is humbling to remind myself that I need to re-read Mythical Man-Month more often.

Wrapup

I am grateful that Frederick Brooks took the time to write The Mythical Man-Month and share his experiences and perspectives, shaped by hard-won efforts on the OS/360 project. These war stories on software architecture and project management, despite the archaic nature of the early-1960s mainframe, are nearly all still relevant today. Some of the quaint limitations of systems back in Brooks’s day — limited memory, need for separate target machines — are still reasonably accurate for embedded systems.

The general is dead, but long live the general’s stories.

In Memoriam: Frederick P. Brooks, Jr. and The Mythical Man-Month

Chapter 3: The Surgical Team

Chapter 4: Aristocracy, Democracy, and System Design

Chapters 5 (The Second-System Effect) and 11 (Plan to Throw One Away)

Chapters 2 (The Mythical Man-Month) and 8 (Calling the Shot)

Wrapup

Sign in

You might also like...

About Jason Sachs

Popular Posts by Jason Sachs

Popular Blogs Series

Free PDF Downloads

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group