pieterh wrote on 25 Oct 2015 16:21

In this article we'll see how estimate the number of bugs in a code base. This article was originally written by Leif Svalgaard and myself, around 1990. It's old yet still relevant to many projects that haven't adopted a zero-defect approach as the ZeroMQ community has done.

Testing is an important aspect of the development process. Many of us spend more time testing than programming. This was not always so. Computer time was once so expensive that you could not afford much testing. Back then we did “desk-checking”, making sure — as far as we could — that there were no errors or defects in our programs.

Today we read about operating system vendors shipping their wares with 65,000 known defects. There is a nasty but persistent rumor -– which I'm just starting -– that this number simply represents the maximum value that the 16-bit bug-tracking layer can support.

There are various definitions in the software engineering literature for software woes such as bug, error, fault, failure, etc. Some differentiate based on the software development phase — compile, test, and usage — in which the problem arises. Why is software defective? Because people are imperfect. How does it become defective? Through errors of commission (normal defect injection including the obvious things such as incorrect program logic, mismatched comments, etc), errors of omission (missed opportunities at improvement, forgetting to cater for all error conditions), and errors of mission (unnecessary complexity, re-inventing the wheel — badly).

Certainly most programmers do not intentionally write defective software. Clearly our software development process should seek to minimize the number of defects in the product as early as possible.

This leads to the Defect Equation:

`Di = Di-1 + Ii - Ri for i = 1,2,3,...`

The number of defects remaining in the product after the ith development phase (Di) equals the number of defects remaining after the previous phase (Di-1) plus the number of new defects injected during the ith phase (Ii) minus the number of defects removed during the ith phase (Ri).

The goal is to minimize Di in the last iteration. That is, minimize the number of defects in the final version of the software that is released to users. Certainly we may begin with initial condition D1 = 0 (undeveloped software contains no defects). The problem is to estimate the number of defects still to be discovered (lower bound for the number still remaining).

After the rather somber tone of the previous paragraphs — defects are serious business — we'll lighten up a bit. Instead of defects — an engineering term that somehow legitimizes them — we'll talk about nasty bugs — what programmers introduce. So, can we estimate the number of remaining bugs?

One of the national sports in Belgium is to try to drink at least one of each brand of beer produced in the country. The difficulty with this sport is that it's hard to get a notion of progress, due to the large and generally quantum nature of the beer market. For instance, there are some brands so rare that it is impossible to both observe and drink the beer at the same time. There are also, in Belgium, some brands so strong that it is difficult to drink and do anything useful at all at the same time, or afterwards, where "afterwards" is generally defined as "oh my god, did I do that?!"

(The other national sports being, in order of popularity: football, watching football, and throwing bottles at policemen after watching football. Apart from the bottles there is absolutely no connection between these sports and the honorable and ancient sport we describe here.)

It's a little-known fact, but the number of beers still waiting to be drunk can actually be estimated. Let's say that the new tavern you are visiting boasts 200 brands of bottled beer. Now, you've 100 different brands on in your little black book — a vital accessory for the serious participant. By comparing your list with the pub's menu, you see that half of the beers you've tried are provided in this pub. This tells you that statistically, you've tried about twenty-five percent of all known beers. So, there are perhaps 400 beers in total, and thus 300 more to try.

Of course, the estimate gets more accurate the more pubs you visit. We can apply this important lesson to the problem of estimating the number of remaining bugs. When we are producing a release of product X we'd like to be able to make a continuous estimate how many bugs there are in it so that we can apply the defect formula. This is actually crucial knowledge, since the technical support effort depends largely on this. Let's say that during the first month of testing a product, we found B bugs ourselves. Our helpful beta tester users found R bugs, of which S were among those we also found. The number of remaining, unknown bugs, U can be estimated as:

`U = (R / S - 1) B + S - R.`

If I already knew about B=3 bugs, the testers found R=7 bugs of which S=2 were already know to me, an estimate of the number of unknown bugs is:

`U = (R / S - 1) B + S - R = (7/2 - 1)*3 + 2 - 7 = 2.5`

If I didn't know about any bugs (B=0), the formula becomes

`(R=7, S=0, B=0) U= (7/0 -1)*0 + 0 -7`

which is completely indeterminate because 7/0 is infinitely large — or more precisely: undefined. So it is vital that there are some bugs for our estimate to have meaning (our programmers would love this).

We've used the above “bug-formula” in our own development work to great effect. It really works.

People have sometimes asked where the bug formula comes from, or how it can be derived (implication is: it can't be true). Here is our derivation of it:

`Bug formula: U = (R / S - 1) B + S - R.`

Let us try to follow the steps in deriving the formula. Imagine the total “bug space” is like a large board depicting all bugs. Known bugs are blobs painted black and unknown bugs — before the test — are blobs painted red. The testers throw darts at the board and either hit a black blob or a red blob — we hope they hit the board and discount all throws that do not hit the board. The number of throws, R, is the number of bugs found. This is a certain fraction, n, of the total number of bugs, T, so:

`R = n T`

Of these R bugs, the number that are black is, everything else equal, the same fraction of all the black blobs — this is actually an important assumption — so the number of darts that hit known bugs is:

`S = n B`

Eliminating n from the last two equations yields:

`R B = S T`

Solving for T gives:

`T = R B / S`

provided that S (and hence B, since B >= S) is not zero. This is the reason that we must insist that B > 0, that is, that there must be known bugs in the product given to the testers. The easiest way to ensure that is simply not to fix the last B bugs discovered during our own testing. Another way is to insert artificial bugs for the testers to find. This latter approach has the difficulty that it is hard to think up realistic bugs.

Now, we can get another equation for the total number of bugs, T, by saying that T is the sum of the number of known bugs, B, and the number of unknown bugs X, both counted before testing starts:

`T = B + X`

Since the testers find R bugs, the number of unknown bugs, U, after the testing, is smaller than X by the number of new bugs found, which is N = R - S:

`U = X - N = X - (R - S) = X + S - R`

Since X = T - B, and T = R B / S, we get

`U = R B / S - B + S - R`

which can be simplified to:

`U = (R / S - 1) B + S - R`

QED.

## Comments