Blogs

Scorchers, Part 2: Unknown Bugs and Popcorn

Jason SachsApril 5, 20202 comments

This is a short article about diminishing returns in the context of software releases.

Those of you who have been working professionally on software or firmware have probably faced this dilemma before. The scrum masters of the world will probably harp on terms like the Definition of Done and the Minimum Viable Product. Blah blah blah. In simple terms, how do you know when your product is ready to release? This is both an easy and a difficult question to answer.

What makes it easy is an agreement on which features must be included. The Magic Happy Mobile Blender must have three speeds. Pressing on the up arrow should increase the speed. Pressing on the down arrow should decrease the speed. If the safety interlock switch is not pressed, the blender should stop. If the sensed motor temperature reaches 105°C, the blender should stop. And so on. These are requirements. A Definition of Done. Minimum Viable Product. Blah blah blah. You can perform tests for these, and the release is either done or not done.

What makes it difficult is the inevitable tension between time-to-market and quality. We can include these features or address these bugs, but it will take longer. So the stakeholders get together and decide that, okay, we don’t care about the automatic cleaning cycle so we can forget that feature, and it’s all right if the capacitive switch detection doesn’t work most of the time if one person presses the BLEND buttons on two blenders at the same time, one with a finger from each hand — ignore that bug, we’ll close it as Won’t Fix. These are judgment calls. They can be contentious. I don’t enjoy those kind of meetings. Deciding to drop a feature can be demoralizing, especially if you’ve spent a lot of time working toward it.

At the moment, however, I’m more interested in the bugs, and something that Donald Rumsfeld said a few years back:

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

What does this have to do with software releases? There is a stage in the product cycle when we’re finding and fixing bugs, and all we know is the bugs we have already found. We don’t know whether there are bugs that haven’t been identified yet! And we won’t know until someone reports them: if you don’t find the bugs, your customers will. There are various studies that say

A significant related insight is that the cost of fixing or reworking software is much smaller (by factors of 50 to 200) in the earlier phases of the software life cycle than in the later phases [Boehm, 1976; Fagan, 1976; Daly, 1977].

[Boehm, 1988]

Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase.

As Boehm observed in 1987, “This insight has been a major driver in focusing industrial software practice on thorough requirements analysis and design, on early verification and validation, and on up-front prototyping and simulation to avoid costly downstream fixes.”

For this updated list, we have added the word “often” to reflect additional insights about this observation. One insight shows the cost-escalation factor for small, noncritical software systems to be more like 5:1 than 100:1. This ratio reveals that we can develop such systems more efficiently in a less formal, continuous prototype mode that still emphasizes getting things right early rather than late.

[Boehm and Basili, 2001]

Defects found in testing were 15 times more costly than if they were found during the design phase and 2 times more than if found during implementation.

[Dawson et al, 2010]

The Rule of Ten states that after each quality assurance level it will cost 10 times more in terms of time and money to correct and fix a defect as in the prior stage. If it takes \$100 to fix a defect at unit testing, it takes \$1,000 at system testing, \$10,000 at UAT, and \$100,000 at production.

[Standish Group, 2014]

and so on. Many of these assertions are unfortunately presented without offering any explanation of the increase in costs. One way to think of it is that during the software development life cycle, the existence of software undergoes “phase transitions” from “gaseous” during architecture/design (no code has been written yet), to “liquid” during implementation, to “gelatinous” during testing, to “solid” after release. There are implications to making code changes at each step: for example, if there are software changes made during implementation, then the code has to change, whereas if there are software changes made during testing, then the tests have to be redone. Joanna Rothman cites an example of three companies:

While Company A didn’t track the time to detect and fix a defect during the requirements, design, and development phases, it did track the engineering time required to fix a defect after the product shipped. After shipping, Company A ran into a common problem: Some customers were quite disappointed by product quality — most frequently, in the areas where the engineers were unable to fix the defects before release. When an unhappy Very Important Customer called senior management, senior management then demanded that Engineering fix the defect as an “emergency” fix.

Fixing the defect after shipping requires additional cost. The developers still had to find the defect, and the testers had to verify the fix. In addition, for each fix the writers had to develop release notes, the build engineers had to create a separate branch, and the testers had to do system-level testing to verify no other egregious defects were caused. Because the product was already released, Company A’s additional system testing costs included:

  • Adding regression tests to the test suite
  • Testing more of the areas around the fix
  • Performing extensive system testing on the different hardware/software platform combinations

During normal system testing, the testers skipped many of these steps because of insufficient time. They could not skip these steps after release, though — testers had to verify that this fix would not cause problems for other customers.

After the fix, the build engineers merged the fix back into the current development branch, and the testers verified that the fix still worked in the current development branch. Aside from the engineering work, middle management spent significant time tracking the progress of the emergency fix work and reporting that progress to senior management and the customer.

This extra work adds up. After release, Company A estimated that it spent 20 person-days per fix. At \$500 per person-day, this comes to \$10,000 per fix. Company A typically had 20 “emergency” fixes after each release, for a total post-release fix cost of about \$200,000.

[Rothman, 2000]

So the next time you run across one of these emergency bugs post-release, and one of your team members goes, “Oh, hey, that’s easy to fix, it’ll just take me a few minutes,” gently remind them that implementation time is only a minor portion of the effort needed to complete an updated release of a product.

There are also additional costs to doing this kind of emergency bug-fixing work:

  • the opportunity cost of staff not working on their normal tasks before the emergency popped up — those tasks are going to be delayed, or the scope may need to be reduced

  • the cost of context-switching and losing focus — if I am in the middle of work on Complex Task A which would normally take me 5 days, and someone disrupts me to work on High Priority Task B which takes 2 days, how long is the total time it takes me to complete both tasks A and B? The answer is not going to be 7 days; instead, it will be 7 days plus

    • whatever time it takes me to wrap things up with Task A so that I can resume it more easily after Task B is complete
    • the time to change my focus from Task A to Task B
    • plus the time to change my focus back to Task A after Task B is done.

    This extra cost isn’t too bad for short tasks, but a complex multi-week task might me a whole day to handle a disruption. That doesn’t mean I’m sitting there scratching my head for a day before I can resume work on something, but rather, until I get back in the zone and focus on Task A in my short- and medium-term memory, I’m operating on reduced productivity, and I may have to stop and re-acquaint myself with aspects of Task A.

At any rate — there’s a strong incentive to address a bug before release rather than after.

Scrum teams doing frequent delivery may argue that there’s no “before” or “after” release; rather, a continuous series of releases, and it doesn’t cost any more to address a bug in version 1.04 than it does in version 1.03 — well, sure, that’s true, as long as customers are okay waiting a little bit longer. Once a bug is present in a published version of software, the cost to developers shouldn’t change if a fix is delayed. It might even decrease because some constraints are relaxed and it can wait until the developers are ready to focus on it. But imagine for a moment this sequence of events:

  • Version 1.02 is released
  • Version 1.03 is being developed, and three bugs are accidentally introduced and identified in the process: issues 1152, 1153, and 1154.
  • Issue 1152 is fixed during development
  • The team argues about whether 1153 and 1154 should be fixed for release 1.03, and eventually decides that they will not be fixed for 1.03, instead they will be fixed for 1.04
  • Development work on 1.03 completes
  • Version 1.03.01 is created by an automated build
  • Integration testing starts on version 1.03.01 — the team has a bunch of automated tests already, but there are a few things that they can’t automate (perhaps this is an embedded system and the hardware doesn’t lend itself to automated testing at a system level)
  • During testing, a new bug is found: issue 1155
  • Vice President XYZ steps into a meeting, mentions that issues 1153 and 1155 are urgent must-fix or customers R and S will take their business elsewhere, and that 1154 doesn’t need to be fixed yet but is important and should be mentioned in documentation and training. Furthermore the team needs to release ASAP.
  • Team fixes 1153 and 1155, taking some shortcuts UVW from their normal best software practices (SOLID, DRY, etc.); issues 1154 and 1155 are closely related, so the fix for 1155 involved working around issue 1154
  • Version 1.03.02 is created by an automated build
  • Integration testing restarts on version 1.03.02
  • Integration testing is complete
  • Documentation and training materials are updated to mention issue 1154
  • Version 1.03.02 is published — team celebrates!

Now: we have several classes of bugs—

  • Cost to fix issue 1152: implementation only
  • Cost to fix issue 1153:
    • implementation and testing
    • shortcuts UVW will need to be fixed later
  • Cost to fix issue 1155:
    • implementation and testing
    • increased implementation cost to work around issue 1154
    • shortcuts UVW will need to be fixed later
    • workaround will need to be fixed later
  • Cost to fix issue 1154:
    • Version 1.03.02: documentation and training materials
    • Future version: implementation and testing, and updating documentation and training a second time

The further a bug gets into the product release cycle, the more effort is needed to fix it and to deal with the consequences of leaving it unfixed.

Popcorn

Bugs don’t just reveal themselves; they have to be found. Some are easy to find, some are not. I have a mental model of popcorn kernels:

Software bugs Popcorn
One bug One popcorn kernel
Bug that exists but has not been found Unpopped kernel
Bug that has been found Popped kernel
Finding bugs Heating up the kernels

If we’re making a bowl of microwaved popcorn, we have to decide how long to run the microwave. Shorter times = more unpopped kernels; yuck. Longer times = fewer unpopped kernels; hurray. But if we go too long, we start to become impatient, and some of the popcorn burns.

There’s also a pretty easy way to decide when to stop, if you’re willing to stay by the microwave and wait: the pops come fast and furious and then start to slow down, so when they get slow enough, you stop the process and take out your popcorn. You don’t have to study mathematics or physics to learn this; it’s almost an intuitive strategy for anyone who has made popcorn in the microwave.

And this is the idea that I want to get across:

If you’re testing your software, and you are still finding bugs at a frequent rate, expect that there are more bugs that have not been identified.

When is the right time to stop testing? I don’t have an answer; maybe if you’ve gone a week without finding bugs, then that provides enough confidence that any remaining bugs are obscure and hard to find.

I do know that if my team is consistently finding more than one bug a day, then it’s likely that we’ll find more bugs the next day.

I also know that if there is a bug that prevents us from using some features of the software, then there’s a decent chance that unknown bugs will lurk in those features, because we haven’t been able to test them at all. So any bug that prevents complete testing should be given higher priority for fixing, just in terms of risk reduction.

And if time weren’t a significant cost, we’d just keep testing until we don’t find any new bugs for a long time, to gain confidence in our software. (Or more methodically: we’d track test coverage and check that we’ve tested all functions under a foreseeable range of conditions. In some industries this is a requirement — aerospace, automotive, and medical are examples.)

But time is a precious commodity....

I wish I had a more quantitative or evidence-based method for determining the right tradeoff.

In the interest of exploring the popcorn analogy, I decided to see if there were any articles on popcorn that might have some analogous conclusions for software testing. I was surprised by what I found, although most of it was not relevant for my purposes:

One that seemed directly relevant to my popcorn analogy was called Popcorn Making: Leaving No Kernel Unpopped from Gold Medal Products Company, a manufacturer of food equipment for movie theaters and concession stands:

Other factors for popcorn popping success are being a good listener and popcorn storage. After the machine gun-like cadence at the height of a popping endeavor, listen carefully for when the popping slows down. Wait until there is a second or two between pops before removing the popcorn from the heat source. But don’t wait too long, or the popcorn will begin to burn.

No math or data or theory, though. Bah.

The only other useful references I found were from Paul Pukite, an engineer with BAE Systems who has written on the mathematics of “GeoEnergy” or the geological science of finding hydrocarbons. Here is a blog post from October 2009 titled Popcorn Popping as Discovery:

In the search for the perfect analogy to oil depletion that might exist in our experiential world, I came across a most mundane yet practical example that I have encountered so far. Prompted by a comment by Memmel at TOD who bandied about the term “simulated annealing” in trying to explain the oil discovery process, I began pondering a bit. At the time, I was microwaving some popcorn and it struck me that the dynamics of the popping in some sense captured the idea of simulated annealing, as well as mimiced the envelope of peak oil itself.

So I suggested as a reply to Memmel’s comment that we should take a look at popcorn popping dynamics. The fundamental question is : Why don’t all the kernels pop at the same time? It took me awhile to lay out the groundwork, but I eventually came out with a workable model, the complexity of which mainly involved the reaction kinetics. Unsurprisingly, the probability and statistics cranked out straightforwardly as it parallels the notion of dispersion that I have worked in terms of the Dispersive Discovery Model.

Pukite goes on to cite Byrd and Perona’s 2005 study, Kinetics of Popping of Popcorn (which is unfortunately behind a paywall) and uses his “Dispersive Discovery Model” to fit the data:

$$ \begin{aligned} P(t,T) &= 1 - \frac{e^{-\frac{B}{f(t,T)}}}{1+\frac{A}{f(t,T)}} \cr f(t,T) &= e^{R(T,t)} - e^{R(T,0)} \cr R(t,T) &= k(T-T_c)^2t - c(T-T_c) \end{aligned} $$

The authors of the study don’t use the same formulation as I do because the theorists don’t tend to apply the fat-tail dispersion math that I do. Therefore they resort to a first-order approximation which uses a Gaussian envelope to generate some randomness. They essentially do the equivalent of setting the B term to 1 and the A term to 0. This really shows up in the better fit at low temperatures at early popping times....

He then goes on to show a bunch of graphs showing how well they fit.

The upshot is that this popcorn popping experiment stands as an excellent mathematical analogy for dispersive discovery. If we consider the initial pops that we hear as the initial stirrings of discrete discoveries, then the analogy holds as the popping builds to a crescendo of indistinguishable pops as we reach peak. After that the occasional pop makes its way out. This happens with oil discovery as well, as the occasional discovery pop can occur well down the curve, as we have defined the peak in the early 1960’s.

The same general argument was used in Appendix C: Dispersion Analogies of the 2018 book “Mathematical Geoenergy: Discovery, Depletion, and Renewal” by Pukite, Coyne and Challou.

This idea of “dispersive discovery” sounds very relevant to software engineering. The process of finding bugs is one of discovery of needles in a haystack — the theme of another one of Pukite’s articles showing some mathematical models of oil discovery, as well as a discussion of some of the insight behind it: some oil is more difficult to discover, primarily because it exists at increased depth.

I can see a lot of parallels with finding software bugs; some are easy to find and others are more difficult, because the set of circumstances needed to reproduce a bug is more complex.

So presumably if we had a software project, and entered a testing phase where each day we recorded the number of hours we spent looking for bugs, and the number of bugs found, we could graph the cumulative number of bugs vs. the cumulative number of hours spent looking for them, and we could fit a curve and try to extrapolate a few days in the future.

Then we could set some kind of threshold as a cost tradeoff — and I would challenge engineering teams to set the threshold higher than you think; again, the cost of fixing bugs that make it into production is high.

I have data from JIRA on projects I’ve used at work, but unfortunately it only includes the number of bugs found each day, and not how many hours we spent trying to look for them.

Perhaps some of you have taken such a quantitative approach before — if you have, please let me know!

At the very least, please look at the number of bugs you’ve been finding. If you’re still finding them, then there’s still some out there, like cockroaches. Note, however, that the converse isn’t true: absence of evidence is not evidence of absence. Test, test, test, and make sure you have planned for good test coverage.

Advice for the embedded world

Embedded systems are a bit different than networked desktop or mobile computers. If my software vendor finds and fixes a bug, they can notify me that there’s an updated version out there, or even push the updates automatically. While this is theoretically true for networked embedded devices as well, firmware upgrades are a rarer thing, for a couple of reasons:

  • Internet-connected embedded systems are still relatively uncommon. I don’t have any reputable statistics to back this up — if you know of any, please let me know — other than checking my own household, where I have only three networked embedded systems (two smart thermostats and one other device) and a Roku streaming video player, but many other embedded systems — a TV, a DVD player, a car (cars nowadays have dozens of microcontrollers), a coffee maker, a refrigerator, a microwave, an irrigation system, a couple of digital cameras, a GPS receiver, a battery charger for my car, smoke alarms, etc. — that are not networked. I’m not even sure the Roku should count as an embedded system; it’s basically just a small single-board computer with a power supply jack, an infrared receiver to pick up signals from remote controls, a WiFi interface, and an HDMI port.

  • The business models of consumer embedded systems don’t lend themselves to supporting firmware upgrades. I have a 10-year-old Panasonic digital camera, the DMC-G2, which was first announced on March 9, 2010. Its last firmware update was published only 8 months later, in November, 2010:

    Panasonic doesn’t want to sell me or even give me new firmware; they want to support and sell their latest cameras, which is how they make their money.

If you’re working on embedded firmware, for the most part you have one shot at a release; maybe two or three if you have a high-volume product that is still coming up the popularity curve. Testing becomes much more important for firmware than software for computers. I think the same “popcorn” advice applies, though; if you’re still finding firmware bugs, you’re not done yet.

Wrapup

Today we talked about when a software release is ready to ship, and how that time relates to the frequency of finding bugs. The cost of fixing bugs late in the release cycle is high, compared to finding and fixing them early enough in design and implementation phases, so they don’t incur late-stage costs like testing, documentation, training or the expedited, high-visibility emergency fix. The popcorn analogy is that unidentified bugs are like unpopped popcorn, and you should wait until the rate of new bug reports slow down significantly before you’re done, otherwise more bugs are likely to be uncovered after release, when they become more expensive to fix.

Thanks for reading and stay healthy!


© 2020 Jason M. Sachs, all rights reserved.


Previous post by Jason Sachs:
   Racing to Sleep
Next post by Jason Sachs:
   Tolerance Analysis

[ - ]
Comment by djacksonJune 14, 2020

The popcorn comparison is great!

[ - ]
Comment by jerryjamesJune 3, 2020

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up

I agree with the terms of use and privacy policy.

Try our occasional but popular newsletter. VERY easy to unsubscribe.
or Sign in