A Response to “Why Most Unit Testing is Waste”

A few months ago I came across the article Why Most Unit Testing is Waste by James O Coplien. The title is an accurate description of the contents – James considers most unit tests to be useless. He expands his arguments in the follow-up article. I was quite intrigued, since I get a lot of value from unit tests. How come we have such different views of them? Had I missed something? As it turns out, I was not persuaded by his arguments, and here is my response to the articles.

The main thesis in the articles is that integration tests are better than unit tests, to the point where integration tests should replace unit tests. I agree that integration tests are effective, but I think a combination of both is even better.

When to Unit Test

In my experience, unit tests are most valuable when you use them for algorithmic logic. They are not particularly useful for code that is more coordinating in its nature. Coordinating code often requires a lot of (mocked) context for unit testing, but the tests themselves are not very interesting. This type of code does not benefit a great deal from unit testing, and is instead better tested in integration testing. For more on the different types of code, see Selective Unit Testing – Costs and Benefits.

An example of algorithmic logic is a pool of temporary numbers in a mobile phone system. The telephone numbers in the pool are used for routing calls in the network. Typically a request is made to get a temporary number, it is used for a few hundred milliseconds, and is then released again. While it is used, it is marked as busy in the pool, and when it is released, it is marked as free in the pool (and can then be handed out for another call). Because there is no guarantee that the release request will be received (due to network errors), the pool needs a time-out mechanism as well. If a number has been marked busy for more than say 10 seconds, it will time-out and be marked as free, despite not receiving a release request. The pool functionality is well suited for unit testing. One way to handle the time aspect is to make it external, see TDD, Unit Tests and the Passage of Time.

Why Unit Test?

Well-tested parts. When testing the complete application, it is an advantage to use well-tested parts. It is the classic bottom-up approach of composing functionality from simpler parts. In the example above, if the pool functionality has been unit tested, I can concentrate on making sure it works well with the rest of the system when testing the complete system. If it was not unit tested, I might still find problems with how the pool works, but it could take more effort to find where in the code the problem is, since it could be anywhere.

Decoupled design. When you design the building blocks of your system to be unit tested, you automatically separate the different pieces as much as possible. If you don’t, unit testing becomes quite difficult, often requiring the set-up of a complex environment. I was quite surprised at how much better separated my code become when I started writing unit testing as I was developing. This is also why retrofitting unit tests is so hard – the parts of the pre-existing system are usually all tangled together.

Rapid feedback. Sometimes the complete feature you are working on may take a while to finish, or it requires other parts to be done first. Thus it is not possible to integration test it at each step of the way. With unit tests, you get feedback immediately if you make a mistake. For example, I never make off-by-one errors anymore. As soon as I write the code, I have tests that check the values in the critical range to make sure it works correctly.

Context. Sometimes it is easier to set up the context you want in a unit test than it is to set it up in the complete system. In the example with the pool, it is easy to create a small pool, fill it up with requests, and then check the behavior when there are no free numbers in the pool. Creating a similar situation in the complete system is harder, especially when the time the temporary numbers are in use is small.

Flawed Arguments

I think James misunderstands or misrepresents unit tests in several ways:

Delete tests that haven’t failed in a year. James argues that unit tests that haven’t failed in a year provide no information, and can be thrown out. But if a unit test fails, it fails as you are developing the code. It is similar to a compilation failure. You fix it immediately. You never check in code where the unit tests are failing. So the tests fail, but the failures are transient.

Complete testing is not possible. In both the original and follow-up article, James talks about how it is impossible to completely test the code. The state-space as defined by {Program Counter, System State} is enormous. This is true, but it applies equally to integration testing, and is thus not an argument against unit testing.

We don’t know what parts are used, and how. In the example of the map in the follow-up article, James points out that maybe the map will never hold more than five items. That may be true, but when we do integration testing we are still only testing. Maybe we will encounter more than five items in production. In any case, it is prudent to make it work for larger values anyway. It is a trade-off I am willing to make: the cost is low, and a lot of the logic is the same, whether the maximum usage size is low or high.

What is correct behavior? James argues that the only tests that have business value are those directly derived from business requirements. Since unit tests are only testing building blocks, not the complete function, they cannot be trusted. They are based on programmers’ fantasies about how the function should work. But programmers break down requirements into smaller components all the time – this is how you program. Sometimes there are misunderstandings, but that is the exception, not the rule, in my opinion.

Refactoring breaks tests. Sometimes when you refactor code, you break tests. But my experience is that this is not a big problem. For example, a method signature changes, so you have to go through and add an extra parameter in all tests where it is called. This can often be done very quickly, and it doesn’t happen very often. This sounds like a big problem in theory, but in practice it isn’t.

Asserts. James recommends turning unit tests into asserts. Asserts can be useful, but they are not a substitute for unit tests. If an assert fails, there is still a failure in the production system. If it assert on something that can also be unit tested, it is better to find the problem when testing, not in production.

Conclusion

Despite not agreeing with James on the value of unit tests, I enjoyed reading his articles. I agree with many things, for example the importance of integration testing. And where we don’t agree, he made me articulate more clearly what I think, and why.

Unit tests are not a goal in themselves. They are not useful in every situation. But my experience is that they are very useful a lot of the time, and I consider unit testing one of my key software development techniques.

53 responses to “A Response to “Why Most Unit Testing is Waste”

  1. Hej, Henrik,

    Thanks for keeping the conversation going. I think you’re a bit clumsy with your interpretations, and I find some of your retorts are either rhetorically inept or just plain wrong. Well, communication is hard and while I guess you could have Skyped me in an agile way, you choose comprehensive documentation over individuals and interactions. Let me try to clarify here.

    You seem to arbitrarily distinguish between transient failures and failures. I make no such distinction, and I find nothing in my writings that would lead one to believe there is such a distinction. A transient failure is a failure and, to me, indicates that the test should be retained. I would retain the test in that case, as I say in my article. I think maybe you didn’t understand that. Why do you, seemingly arbitrarily, discount this as a flawed argument?

    In terms of what parts (sequences) are used, it’s trivial to see that the waste is much higher if exercising the interfaces at the unit level than exercising external system interfaces: that’s just simple information theory.

    “My experience is that this [refactoring breaking tests] is not a big problem” is hardly central to my argument; it is a side discussion about what “refactoring” is: see the recent exchange between myself and Russel Winder on Facebook. It just relates to a unsustainable argument for the use of unit tests under what is an unreasonable application of refactoring.

    Of course I agree it’s better to find the problem in-house than in production. I don’t say otherwise. But you fail rhetorically by making this an either-or proposition, and it’s not. Asserts find problems in-house, too. But they also protect your customer in the event you don’t cover all the combinations in house. Both of our approaches will find the same bugs in-house, but I guess that were we differ is that you prefer that your customers corrupt their data, whereas I prefer that a test-cum-assertion guards against the corruption.

    I’m guessing that you unit-tested this site. Each of the credentials boxes below (when posting a reply) works well individually. I’m sure that they worked under the unit tests you ran in-house. But when I fill in my email the box label for the “Name” field disappears. I end staring at an empty box, wondering what to put in there. Another delivered bug for Crisp 🙂

    And of course, this discussion of unit tests pales by comparison to thinking at the systems level. By analogy, this site has no preview, no notation about paragraph formatting or other use of HTML tags. Ah, WordPress processes and tools over individuals and interactions.

    Hope we take this up on a panel at a conference some day.

    • Hi James,

      Thanks for commenting! I think a blog post lets more people in on the discussion than a Skype session.

      Point taken about transient failures. How do you keep track of if a given unit test has (transiently) failed in the last year?

      One of your bullet points states “Turn unit tests into assertions.”

      As for WordPress, I am but a user – I have no idea how they test the code 🙂

  2. Hi, again Wayne!

    | I think a blog post lets more people in on the
    | discussion than a Skype session.

    No big deal. It just would’ve been good to make sure we were on the same page so when you started shooting, you would be shooting at the right target. But I value the opportunity for public discussion and honour that 1. you took the time to post your thoughts; 2. you graciously post my cranky replies and, 3. you continue here in the discussion. Hats off to you.

    | Point taken about transient failures. How do you keep track of
    | if a given unit test has (transiently) failed in the last year?

    I really don’t care. An obvious place would be in your TMS. But a simple two-column spreadsheet would work just about as well.

    | One of your bullet points states “Turn unit tests into assertions.”

    Right, but assertions should still check the same things that the unit tests did in the first place. In most cases they can do even better because when writing assertions we tend to think in terms of broad invariants instead of case-based reasoning. There are a few classes of unit tests that are awkward with assertions, but I’ve never been limited by that. In the end, keeping the checks in the delivered code makes a difference only in the quality of your customer data.

    So why don’t people do this? Are they stupid? As I related in Chapter 1 I once had a boss who told me to pull out all the assertions when we shipped the code. It would have been trivial to tie assertion failure to an automated bug report, but he explicitly stated that he didn’t want the company embarrassed by having the software fail in the field for an internal error. That is, he valued his reputation above the customer getting good data. (Bad data bore a $5 million per-instance price tag in that system.) So I left the company. (Well, he *was* stupid, but that’s beside the point.)

    There are, I think, deeper and more sinister reasons that people don’t do this, and I’ll be talking about them in my FiSTB keynote coming up shortly.

    | As for WordPress, I am but a user – I have no idea how they test the code 🙂

    As an end user of your ‘blog, I read that as, “Screw the users — let them deal with WordPress directly.” 🙂 So, when you write software, do you write it all yourself? Naw, you probably use somebody’s libraries or at least their compilers. Consistent with your behaviour here, I guess you don’t test them. And if the customer complains you pass them on to the library vendor? Wow! That’s efficient. It saves you all that messy work in dealing with bugs in third-party software.

    You delivered software here (a web site) without testing it at all? Geez, we should get to good design, and then doing any testing at all in the first place, before taking up the angels-on-the-head-of-a-pin discussions about unit testing 🙂

    It’s nonetheless a great anecdote for Chapter 3. I’ll be sure to pass it on to my readership 🙂 With full credit 🙂

  3. If an assert hasn’t fired for a year, should it be removed? Presumably the assert isn’t telling you anthing. Equally, an integration test that hasn’t failed in a year isn’t telling you anything either.

    • Jack – actually, that’s an interesting question. As for integration tests, the same principle applies. (“A year” is a parameter, and will differ from app to app.)

      As for assertions, the answer depends on the product’s risk averseness. Look at the ration of NPV of the test divided by the risk associated with failure of the corresponding business predicate. In evaluating the answer it’s important to remember that the consequences are different for tests than for assertion. For a given behavioural predicate X there will always be latent risk associated with X if it’s only a test (unit or integration — I don’t distinguish), but you can drive that near zero with assertions for the same X. The results are always large in any calculation with zero as a denominator 🙂

  4. Oh, and one more thing — unit tests bear the recurring long-term cost of repeated execution. Assertions can be driven as part of your acceptance testing effort, more or less for free.

  5. This is to @James O. Coplien

    Why would you ever remove a test or an assertion?

    The fact that it hasn’t failed in a year is a sign of a lack of regressions – but there are so many external factors involved in large systems that this statement is simply not valid.

    Consider long term update strategies of underlying dependencies. How often is an OS or language upgraded. Many of our unit tests require built in language functions (mcrypt, array functions, ternary operators and so on). Whilst not strictly the function of the Unit Test it does quickly highlight any issues, as software itself and the environment that it is running on are one and the same.

    In my experience unit tests are extremely important, often more so than their primary goal. A new unit test covering a regression is great from a customer journey perspective.

    Unit testing also forces developers to lower the cyclomatic complexity of their methods. The quality of code I have seen when a developer knew they had to cover all scenarios and provide 100% test coverage of public methods, vs. code with no tests – is like chalk and cheese. In my opinion unit tests are a way of guiding developers, with integration tests and full mocking a way of joining the dots and proving the user story is satisfied.

    Both are equally important; one internal facing, one external facing. Neither should be discounted.

  6. Hej, Stuart — thanks for a thoughtful question. I think I want to answer it in chunks.

    Of course the unadorned claim of “one year” is an abstraction, but for me to go into the math of formal risk assessment would have been too tedious for most. And you’d never get the return on the time you invest in doing such calculations for each test. Adopt your own rule of thumb.

    The “one year” is input to the risk calculation but is not the sole driver for removing tests. If your tests are free — they run in zero time and cause no delay in the development process, require no maintenance effort, and so forth — you’re halfway there. You can get the rest of the way there by assuring that each test reflects an agreed value proposition. Unfortunately many unit tests (and other kinds of tests as well) don’t make the cut and should be removed. But even those that make the cut may create an overly large test inventory. There are an infinite number of possible unit tests on any API, so you can add tests indefinitely. And some people (we call them “test-infected”) do. This takes you into silly territory that is well-described in the literature (see Weinberg’s “Perfect Software” for lots of real-world examples). It’s that very example that Richard from Sogeti presented to me and which started my latest inquiry into unit testing, and this series of articles. His unit test maintenance had become greater than the software maintenance without concomitant value.

    If you find that your tests don’t run in zero time and that they are starting to cause the edit-compile-test cycle to be too long, then you need to cull tests. (If “too long” is longer than a minute or so, and certainly if it is longer than 5 minutes, you should probably not be calling yourself agile and might even consider artificially limiting the frequency with which people can submit and test to avoid blind iteration — expand it to 30 minutes or so.) If you find that every code edit requires an awkward and expensive test edit, same thing. You need to cull them in part according to their weakness at mitigating risk. Long mean-time-to-failure is a strong indicator, particularly at the unit level, of low value.

    The argument is slightly different for unit tests than for assertions, since unit tests require a separate process step that constitutes a context switch, which can be forgotten, or which the project can decide to skip under duress (just about every shop I visit has its own set of tests sitting on the shelf waiting for the day that never comes when they’re brought up-to-date with the software and returned to the active list—Why do you think the maven.test.skip property exists?) For assertions I’d formulate your question in a slightly different way: Why would you ever remove a check in your code that checks for bad user input, division by zero, a null pointer, and so forth? These arguments make sense for assertions but are more difficult to justify for unit tests alone. Moving this logic into assertions in any case provides the same risk mitigation that the test would have assured, and then some.

  7. Hej, again, Stuart — part 2 of 3:

    I distinguish between “test-first” unit tests and regressions. Regressions have been studied and proven to have value. Second, I discourage unit regression tests. Regressions should test at least for business faults and certainly for business faults. You want to articulate the regression at the level of its business significance, and because business requirements rarely appear at the unit level. So most regression tests should be at the system and integration level.

    “In my experience, unit tests are very important.” I am very interested to hear the data on this. You need to approach it from three angles. First, how much value have they added to the business relative to the cost — that is, how do you know that they don’t generate negative ROI? If you can’t give me a hard number I can argue quite convincingly that it’s negative. Second, even if you can give me a number, how much more cost-effective is it than code inspections, reviews, assertions, more disciplined design, and the other techniques that Capers-Jones describes? No fair giving any numbers without having tried it and measuring it, because the numbers in the industry literature would otherwise likely discount your answer. Third, how do you know what unit tests are _not_ achieving for you, and how do you measure that? (Please don’t make me laugh by saying “code coverage,” or at least give a model that undermines the model I laid out in the two chapters I posted.)

  8. Stuart, I’m saving the best for last.

    Yes! Unit testing does lower the cyclomatic complexity of methods! Or, at least it does if you’re using an almost meaningless metric like code coverage. (It isn’t totally useless because it does tell you what code you haven’t tested. Most people think that coverage tells you something the code you have covered. It really doesn’t (the information — in the full, formal sense of the term — that it provides is vanishingly small), but I describe that in the article.)

    In the article I also describe what I have seen empirically when people try to do this. First, they oversimplify: instead of driving their designs from business considerations and possible use case alternatives they drive it from programming considerations and models of computation: That makes sense only if your computational model is a perfect first for your business model. Unit method execution is an almost optimally poor fit for a business model reflected in objects. Classes are bad enough; reasoning about system correctness at the method level is just formal nonsense.

    “Proving the user story is satisfied”? No such thing. First, there is nonsuch proof. It is only a sample. Tests prove very little and they certainly don’t prove correctness. Second, if it’s an OO unit test, it tests one method of one object playing one role in a use case of several objects playing several roles. That a method works says nothing (again, the information is vanishingly small) about the business behaviour (use case or user story). Third, because of polymorphism, there is no “connecting the dots” at test-writing time. If you have even trivial class hierarchies (e.g., three deep) and a trivial number of objects involved in a use case (e.g, three or four) the combinatorics of method combination defy any kind of cognitive chunking. There’s too much uncertainty (method dispatching) at compile time to be able to do that. And if you’ve reduced the hierarchy to 1, or have just one object involved in a use case at a time, then you’re not doing object-oriented programming. In that case, yes, unit testing makes sense as I say at the very beginning of my first posting. But, then, why in the heck are you using methods and object orientation instead of FORTRAN functions? Hmmm, maybe JUnit doesn’t work with FORTRAN, and you’d end with Green Bar Fever 🙂

    In summary, I think your arguments are more from a personal feeling, based in expectations of testing and occasional successes, than they are from any rational perspective by which I’d run a business. Have another read of Jerry Weinberg’s book, and make sure you understand the arguments in my articles, and come back again.

  9. Ah, I posted before I went back and talked about the other problem with reducing cyclomatic complexity. But I already talked about that in my article, do, regarding the client with the maturity metric. What I see happening again and again (this is real, typical practice — not just theory) is that people split the methods up to achieve the coverage metric (tantamount to reducing the McCabe numbers) and you end up with method explosion. That ends violating tons of design rules like the Laws of Demeter and it leaves your code unmaintainable. And the tests are so hopelessly under-contextualized that their individual successes and failures carry very little information. The bugs end being not within the methods, but between the methods. This is the hardest thing to learn for those with a reductionist worldview.

    But if your methods are so complex that you need McCabe numbers in the first place, you’re doing something horridly wrong that testing won’t fix. Good object methods are small and their semantics highly compressed; this technique results in poorly compressed methods that often bear little relationship to the domain of their class, but which are are only artefacts of a mechanical refactoring process. Even if you used this technique as an interim way to reduce the McCabe numbers, you’re likely to do that in a way that creates even more horrid designs, because the methods reflect poor coupling and cohesion. To do this properly requires additional extrinsic information. Do a study where you measure coupling and cohesion together with the McCabe numbers and then do a regression analysis on the graphs. I’m almost certain you’ll find that solving this problem causes worse problems elsewhere.

  10. Håkan Söderström

    Hi Henrik, for what it’s worth I support the substance of your points. The advice to throw away tests that haven’t failed in a year is downright silly. It casts a shadow of doubt over all the original article, even though it presents several arguments well worth considering. /Hakan

  11. Håkan,

    It’s really hard to argue with such a compelling rationale as: “that is downright silly.” I can’t tell you how much I appreciate such a thoughtful reply, and I’m sure that as a tester you show the same professionalism to those whose code you test.

    This whole quest into unit tests as waste started when a client, Richard Jacobs of Sogeti, found that they had massively more test code than delivered code and that its maintenance was weighing them down. Further, the value from most of the tests was small. He asked advice for how to go forward.

    One of the speakers at the FiSTB conference last week in Helsinki related more or less the same story to the audience from his own experience. Test mass tends to grow without a concomitant increase in value.

    Both these guys had real numbers about the cost and benefit of increasing their test mass. Can you present numbers, or “an argument well worth considering” about why we should honour your pronouncement of “silly”?

    Have a good read of the testing literature (e.g., “Perfect Software” by Weinberg or “Lessons Learned in Software Testing” by Bach et al), which notes that instead of automating tests, you should be attentive to the testing itself and to use each test as a way to develop further tests. There is not a point where you say “enough.” It is not just a matter of writing new tests when there is new functionality. Given that you are doing good testing this way your tests will grow without bound.

    Much of the time I have seen people manage their test inventory by throwing away a test the first time it fails (my articles cite exactly such a case), which is throwing away information. In any case, a proper approach to testing leads to a boundless generation of tests, and you need some criterion by which to judge throwing them away.

    The “one year” is a parameter and I count on your intelligence to modulate it. But at some point, a risk-based approach to testing will decide that enough is enough. It takes an intelligent tester to know when to retire a test. I think that one of the reason that testers get the reputation of crank-turning monkeys is that they avoid this depth of thought and, out of fear, equate quality with test mass. Words like “silly” are emotive words, and the arguments of people who use them are driven my fear. My arguments are driven by information theory. Can you tell us what drives your arguments?

  12. Can’t help but notice how Jame’s pretty much is a lean developer.
    Lean will be popular because the industry is slowly starting to realize how all of these “processes” (ex: adding pointless unit tests just to have “coverage”) are nothing more than red tape that kill velocity.

    I’m having a similar problem with my current client. One of the developers always programs the general case. We always program what the business requires. Conflict ensues. I think because there’s a culture of equating solving the general case with quality code. James points out that everything should always be relative to what the business wants. This is the mentality that’s become ingrained after years of managing a hundred-million dollar system with a team of 6 developers. The results that I’ve seen are quality software (relative to the business), with way fewer developer resources, and done way quicker.

    As for unit tests, the issue I’ve always had is the cost of keeping up-to-date tests versus the bugs that they find. A related issue is the occasional problem of having to make the code ugly just to make it unit testable (which James also points out.) Looking at unit tests from the lean point of view, the perspective that I’m gradually fleshing out is that you should test business use cases (primarily). That results in a smaller (hopefully more manageable) set of tests. And it puts a strong emphasis on *system* tests.

    Another way of looking at it, is “If a bug happens and the business doesn’t care, is it a bug?” At that multi-million shop, the answer was always “No.”

    • Yup; thanks, ikaruga2099. For more on my lean perspective, see “Lean Architecture.” It goes beyond the all-too-often superficial view of “lean” to the deep foundations that come from the Toyota Production System.

      • @James Thank you for sharing your insights. I have a question. In our TDD environment, developers will write “expectation” unit tests, e.g., that an exception is thrown on illegal state. In the code that’s being tested, assertions are in place to test for illegal state and then throw an exception. If I understand your proposal correctly, you would remove such a test, because the assertion is in place in the program code. If this assertion code is somehow tampered with, our test would then fail and we would, thus, be aware of broken code. How does your suggestion ensure the same? Many thanks.

      • As usual, the answer is: It depends. There’s nothing in what I said that would automatically cause you to remove the test. It depends on: 1. Whether you are responding to object state or system state; 2. Whether it is actually an exception or is a interrupt, and 3. How the program’s fault tolerance architecture is laid out.

        In general, I hold exceptions to be a bad idea. Their use in C++ was largely a technical solution to the political problems of library vendors’ inability to come together on error handling interoperability while still needing interoperability of their library. Exceptions are just hyper galactic GOTOs. With good programming discipline they can be replaced by return codes (or, sometimes, by callbacks) and handled locally. If you’re throwing an exception across more than two activation records, you probably can’t reason about the code.

        I can imagine some instances where you need to test the system’s ability to recover from an illegal state to a sane state, but I’d test the system state rather than go to the level of looking for the exception. The exception is the mechanism; the behaviour to be tested is the state transition. That is a matter of testing system state. That means testing that all intervening resources are cleaned up, for example.

        If you have exceptions with resumption, I don’t know what to tell you, except that you’d better have one heck of a good set of programming disciplines that keep your code simple, simple, simple.

        Interrupts, on the other hand, you need to handle. (The distinction is whether the signal is asynchronous with respect to the normal progression of execution.) In most applications they cross may spheres of influence in the architecture and local (unit) testing is almost impossible.

        In summary, it’s hard to generalise, but asking whether unit testing of exceptions makes sense sounds like asking if two wrongs can make a right.

      • Finally, someone else that thinks exceptions are crap and unit test are an exercise in ego stroking. I’ve been saying it for years. How can a programmer write a test that is smarter than his code? If they write crap system code, they write crap test code.

        These guys think TDD is great.
        http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=5763742&article=0&lang=en

        However, they apparently do not have a good grounding in statistics.
        http://trt.jaykimble.net/blogs/jacob/archive/2008/01/22/tdd-proven-effective-or-is-it.aspx

        I did stumble across a study of studies regarding TDD and unit testing and the conclusions are consistent. All the research suggests is “maybe”, “maybe not”.

      • David, thanks for adding some sanity here. It’s not only using unit tests that strokes egos, but writing about their virtues as well. Some people are born into beliefs they can’t shake and it’s hard when their beliefs are challenged. I would pity them if they were third-world folk, but they are consultants flaunting their position in ‘blogs with unsubstantiated or totally faulty arguments.

        If only there were a law… But sadly, there ain’t. I wonder some days if their clients actually deserve them.

      • David, thanks for commenting. I sounds like you don’t find that unit tests provide any value for you.
        For me it’s the opposite. I programmed for many years without using them, before TDD-style unit tests became popular with Extreme Programming (XP) 15 years ago.
        When I started using them, I got a lot of benefits, the biggest of which was better structured code.

        As for exceptions, Bob Martin (Uncle Bob) argues that the benefit of exceptions over return codes is that it makes it easier to separate the error handling code from the happy path code. See chapter 7 in Clean Code. A great book, that, incidentally, James Coplien wrote the foreword for. Of course, Uncle Bob is also a strong believer in TDD.

  13. P.S. — I just noticed that a copy of the original posting has been reproduced, with all of its mistaken interpretations and mistakes that I pointed out in my response of 4 September, in Japanese, on a Japanese web site (http://postd.cc/a-response-to-why-most-unit-testing-is-waste/). Of course, my corrections were not reported and there is no opportunity to post replies to the Japanese posting. I’d at least like to be put on notice of an opportunity to refute uninformed criticisms of my material. So: no conversation by Skype beforehand and a conscious attempt to limit feedback to one of my main markets on the other. So, Henrik, what should be do about this?

    • James, we disagree on the value of unit testing. Unfortunately I don’t find the tone you set in your comments to be very conducive to a productive discussion. Despite that, I’ve published all of your comments here. There is no “conscious attempt to limit feedback”. The site you refer to asked if they could translate my blog post, and I said yes. You’ll have to talk to them about what they want to translate.

  14. Henrik, I know that my substantiated arguments clash with your casual opinion. And I think I bent over backward to be cordial and constructive in my initial response; the record above attests to that. After finding that your sloppily critical view of my work had been posted in Japanese translation without letting me know, I found it more difficult to be generous here.

    You are personally responsible for the appearance of your work in translation, and I hold you accountable for publicly criticising my work — falsely, at that — without notifying me and without recourse to comment in that forum. Given the timing of the Japanese publication, you knew that some of your claims were wrong — you admitted as much in one of your responses to me above. Saying things about anothers’ work that you know to be untrue constitutes libel, so you have crossed a very serious ethical and moral boundary here that goes far beyond, er, “tone.”

    No matter: the doyen of testing in Japan, Kyon Mm, has already more or less excoriated you in a Japanese language posting, exactly for the reason that I present here. No further action is necessary on your part. This will take care of itself in terms of your Japanese reputation, and I think your own words here make your own level of astuteness and professionalism clear to English language readers.

    • Yes, your comments (and mine) speak for themselves.

    • Sir, Mr. James, are you for real? The whole discussion aside, Henrik has been more than cordial, and has been publishing all your en’ te’ responses as is, what else do you want? That Japanese blog? Well, go and submit a take down if you have such “accountability” issues, but don’t turn this brilliant discussion into a personal and ugly spat. Henrik has all legal and moral rights to post a response to anything he wants, and get it republished in 100 different languages. If you decide to respond to Donald Trump, and that gets attention, would you personally inform Trump or drop a formal letter to the white house? Joke aside, be rational and relevant.
      You sir, have been calling names and saying downright ugly things to Henrik which are never meant to be spoken as-is in a professional setting, so yeah it’s evident.
      Being said that, you are a brilliant mind in your own way and I give it to you. Your arguments are good in your own understanding and experience, but sir you haven’t seen the world either. There are challenges across software companies, and some great DevOps practicing companies, including Amazon and Microsoft have plethora of unit tests, and they absolutely work wonders – and this statement comes from personal authority.
      I know I am responding to a roughly 3 year old post, but it is important to set the notion and make sure things are discussed in a positive setting, and not in the way of public bashing of either parties.

  15. I’m not sure I can still address Jim here, but.
    1. I think most of antagonism to the title and/or idea is coming from people who misunderstand the difference between various kinds of tests and believe that the tests put by the developer that implements a junit TestCase is a unittest, while the regression/integration/system are done by QA.
    Actually, if I analyze the test code I’ve seen/written through my career – I believe were not Unit Tests in TDD meaning. Most of them test external API by means of external API, most of them use mocks or simulators of some external components.
    And I can say that I wouldn’t through these tests away.
    2. I’d like to hear your opinion on dynamic languages, especially python. In last year I’ve started programming in it, and this language requires you to invest into 100% source coverage, otherwise you can miss syntactic errors when using classes. I’ve not formed my final opinion on it, since besides the syntactical problems python is a very nice tool to work with, but strongly-typed-compiled language background of Ada and Java causes me to abhor and it when see the syntactic errors only after I deploy the code (and we have a hard system with non-easy deployment).

    • Hi, Nick,

      Yes, there is some of that — of people confusing checks with tests, unit tests with regression tests, or unit testing with automated testing. I think I carefully defined my terms and scope in the article, but no amount of explanation can waylay the perceptions that people bring to the table. I think that most of the naysayers would be well-advised to use some non-judgmental inquiry and to ask before jumping to conclusions: that avoids publicly making a fool of one’s self. More to the point, that is the whole psychology of a great tester: to explore puzzles and misunderstanding, rather to impose one’s misunderstandings on a system. And I think such a posture characterises the lion’s share of the reactions to my articles. I saw only two, well-reasoned responses in the many feedback mails and postings (three with yours). One of them, by the way, correctly pointed out that great TDD means continuously throwing tests away and creating new ones. TDD is widely misunderstood, which may be why people continue to advocate it. But that’s another topic, and TDD is dead, anyhow, right? 😉

      “Dynamic languages” is a big topic. I chose Ruby over Python some years back (it was a coin flip) and I love it. I feel I’m a disciplined designer, and I feel that I get a lot of mileage out of good up-front design. With ruby, when I need the design flexibility that Java’s type system abstracts away, I have it. What Ruby lacks in its type system, I compensate for using assertions.

      In the end I prefer languages with stronger compile-time type systems. I think that the main advantage of such languages is that they offer me a disciplined way to express design concerns so the compiler can do the testing at compile time instead of me, at run time. That allows me to find errors more quickly, and that, in the spirit of TPS, reduces rework and waste. An extreme example is generative programming — writing algorithms and data structures (e.g., associative table lookup) using C++ templates. The code in isolation is tested by the compiler itself. Objective-C (my main hammer these days) strikes an interesting balance between C-like type checking, and a bit more, but with an incredibly flexible run-time environment if you need it. (I rarely do, and I completely avoid it for products that I commercially deliver.)

      Reflection is a key concept but, again, too many people confuse it with dynamic languages. C++ has interesting reflection mechanisms at run time, and a few more at run time; Java has only uninteresting reflection mechanisms, and they are all at run time. Ruby is all run-time. That bodes for a little more flexibility and a lot less safety; the latter doesn’t bother a good designer in terms of quality, but it should bother development shops attentive to reducing rework and to finding problems early. Testing is the antithesis of finding problems early. I think unit tests are an attempt to push the problem detection earlier, but they lack the formalism of a compiler’s type system, and they lack the integration scope of system testing. That’s another thing that suggests that they might be a lose/lose proposition.

  16. Pingback: Why Unit Testing is Not Waste

  17. Anonymous Coward

    My take: the arguments are right, it’s just the conclusion that’s wrong.

    Just because many people still don’t do unit testing right doesn’t mean unit testing in itself should be dismissed as a useful technique.

    When you practice TDD, unit tests are a byproduct, not a goal in themselves. OTOH, once written, the cost to keep them around is minimal, when compared to an error they’d fail to detect – I don’t recall if it’s “Code Complete” or “Writing Solid Code” that states that a bug unfixed in one phase of development is ten times as expensive to fix in the next. So sure, you _should_ regularly prune your unit tests, and remove the ones which are no longer relevant, but why remove unit tests which do not fail, but still do test something relevant, as long as they’re so cheap to maintain?

    The main benefit of unit tests isn’t that they test your application. It’s that they provide the developer a really neat, controllable, closed scope environment for testing little bits of functionality. Picture this: you add a new piece to your code – let’s talk about one that isn’t very algorithmic – to signal changes in some model object by putting messages into a message queue. What you need to test are two things: that messages get placed into the queue when the business rules say they should (even if the message queue is far away from what the business talk about, it’s still a business requirement that arbitrary clients should be able to listen to model changes, just translated into a lower level language), and that the message payload correctly reflects the model changes. Automated tests at a higher level than unit testing would need to attach a message consumer to the message queue. Now picture the poor programmer testing this _without_ unit tests: firing up a local instance of the message queue, potentially without the ability to enter its code with the debugger, firing up an application server from a debugger, and writing a message consumer solely for testing purposes, which does all the inspection and assertion on message structure, and on whether all expected messages were received and no unexpected message was received in the process. And imagine how smart this consumer would need to be in order to be able to detect legitimate messages which aren’t generated by the test. And imagine how lengthy the debugging cycle would be. Compare this to calling the code you want to test from within a completely mocked and controlled environment. Obviously, the unit test using mocks throughout doesn’t say much about your entire application’s correctness, but it surely helps you debug code faster. Besides, building up a correctly functioning application from bricks which are working correctly, for a given value of “correctly”, is more likely than hitting the right spot with components you have no idea if they’re correct or not, for the same value of “correctly”.

    Put in a more synthetic way, heavily unit-tested code makes all bugs found by higher level tests shallow and easy to diagnose and fix. Having higher level tests expose some obscure corner cases in very fine-grained, low level components doesn’t help much in fixing those bugs. Therefore, it’s much more likely to build a reliable application from a heavily unit-tested code base without many higher level tests than it is to build a highly reliable application from an application having a similar amount of higher level tests – specifically because the combinatorial complexity of the testing problem explodes as the system under test grows in size.

    The assumption that cheap and massive computing power makes it cheap enough for programmers to do most testing and debugging with high level tests is wrong. Running the full suite of functional and integration tests on a program may require ten or a hundred times more time than running the compile + unit tests cycle. Sure, since computing power is so cheap, the cost of running the higher level tests is still close to nothing. But for the programmer having a debug cycle of a second (the time required to hit a breakpoint with a debugger when using a unit test) versus one of two minutes (when trying to do the same in a non-trivial application using a feature test) isn’t at all similarly cheap.

    An analogy (a way less than perfect one, I agree, but still one which illustrates my point, I’d say): cars are built from parts. Each part – really, even simple components such as screws – is tested to some extent before the entire car is assembled. Imagine what it would mean, in terms of cost, to skip this entire parts testing, and only do an extensive test drive once the car is assembled.

    Yes, most unit testing is waste, if you only look at it from the point of view of ensuring application quality. It’s not at all waste if you look at unit tests as a development tool.

    And there’s one more reason not to give up on extensive unit testing, even if it mostly looks like waste when measure exclusively against overall application quality. It has to do with how the software development process works.

    Software development, IMO, is all about successive translations and enrichments of some piece of information. First, the customer tells you what’s hurting – he states the problem. Then, you (or a BA) model a solution in terms of the business domain. Then you translate this in terms of an emerging solution domain (DDD ringing a bell?). Finally, you are ready to start coding – the code being nothing else than a rephrasing of the solution in terms of the business domain, just in another language. Why would you not start using a language as precise as a programming language as early as possible? If you do this, you end up with unit tests before integration tests – and your unit tests actually reflect business rules. Keeping them around helps you have the compiler test your translation of the business domain model to the solution domain model upon each build. True, you still need to manually and mentally regularly verify that this translation is still valid, but that’s a much smaller jump than the one from business domain rules written in natural language to actual implementation – thus, a lot less of a risk for costly human errors.

  18. Anonymous Coward

    I may be wrong, but I believe I slowly start to understand what caused Mr. Coplien’s original article.

    Again, I may be wrong, but I get the impression that much of the experience Mr. Coplien relies upon seems to be with large, enterprise-class systems, where the actual business logic is split and munged across so many different layers and components that it’s no longer reflected in the methods of individual classes, or at least it is difficult to trace what methods do back to specific bits of business logic.

    I’ve seen such systems too – but that’s IMO a bad design to start with. It may be that unit tests for such systems are mostly waste. But then again, IMO/IME, the effort going into the development and maintenance of such systems is mostly waste in the first place – such systems do too little for how much they cost. Giving up on unit testing for such systems won’t change their cost effectiveness much. IME the cost of a new feature or a change to such a system is at least an order of magnitude higher than it would be if the system had a proper code structure, one reflecting the business processes in more detail.

    Refactoring such systems to a less brain-damaged architecture, where there’s business meaning in almost each method of almost each class, pays off big time – way more than continuing to roll a big ball of mud. (I’m not saying that all large systems are necessarily big balls of mud in the strict sense of the term, as defined by an IMO very insightful article from 1999. You can crochet a lot of accidental complexity into a system, making it both larger and harder to comprehend, while still writing very clean code. This would still be a big ball of mud, in a less strict way, IMO.)

    IME, the cost of refactoring such systems into a proper form would be offset by savings in development effort usually in less than a year, cost-wise, for medium-sized systems of several hundreds of thousands of lines of code and teams of usually less than 10 people, maybe a few dozen at most. (I know LOCs is a shunned metric – but it’s a good metric for overall complexity, be it accidental or essential, and it correlates well with effort. I’m not assuming effort to be equal or proportional to productivity.)

    Usually such an effort also yields a significantly smaller system – thinking in terms of business logic down to the class and method level naturally has you avoid/eliminate a lot of accidental complexity.

    (If anybody objects about the doability of such a refactoring, I do agree that most managers, having little or no understanding of programming, and having had bad experiences with programmers who are bad at high level architecture in the past, will not want to make resources available for such an endeavor, in case they happen to be in charge of a big ball of mud.)

    Once you’ve done such a refactoring, and your classes start to have business meaning, the difference in relevance between unit tests on the new code structure and feature tests on the old code structure, from a business point of view, won’t be that large – business rules will in effect be directly testable by unit tests. Feature tests will mostly only test integration – that’s what I thought when I wrote in my previous comment that bugs being detected by feature tests are shallow and cheap when the underlying components are extensively unit-tested.

  19. I was quite surprised by Jim’s replies, so much that initially I assumed it was just a troll impersonating him. The blog entry was hardly something to be offended by. I think his original article had merit, but I was very disappointed by the tone in the comments, like those utterly ridiculous jabs about not having unit tested the blog system or even his anger over someone responding to his article in public.

  20. In my experience from where i work, white box Unit testing and TDD are totally incompatible. I am a big fan of blackbox testing, apart from scenerios where aspects are the software are out of your control (for example you are testing network processing and you don’t want to setup a server, or you want to test a gui’s backend, without clicking on the buttons).

    Where i work, we have huge areas of the code, that cannot be tested (due to initial bad code layout, functions that are thousands of lines long), so code can’t be tested. We don’t allow code refactoring unless we can test it, but it’s so difficult to unit test code like that, so it’s a catch-22, we can’t refactor it.

    Another issue is I totally agree that whitebox tests that add overhead every time you restructure code is a nuisance. A lot of the arguments for unit-tests could solved using module tests – which are my personal favorites. You can use gcov to verify your code coverage, and also there are only a finite number of inputs and outputs. It only becomes a pain to use module tests when the functions are large, the combinations of inputs and outputs are complex, or some system architect is a total moron and has screwed up the design of the data, making data setup practically impossible.

    In my organization, having to write unit tests all the time, instead of blackbox testing, has prevented code restructuring, has prevented any testing of the code whatsoever in some libraries, it’s made development and code maintenance a complete and total nightmare, and contributes massively to technical debt and has resulted in all our development activities moving to india where they just totally ignore the advice to unit test.

    There are situations where you should definitely unit test, but the dangers of overusing whitebox testing far outweigh the dangerous of overusing blackbox testing in my view.

  21. Tests can only show the presence of bugs. The real problem are side effects.

  22. Rod Macpherson

    I say we adopt a new adjective for this style of testing that more intuitively reflects both its cost-benefit and aesthetic and cultural qualities. We then contrast this Kabuki style with basic integration tests (focused on business value) combined with assertions and the let-it-crash model. I’m betting the latter provides considerably more value.

  23. Pingback: Standing in the middle of life… – pragmaticiot

  24. It has been a really interesting reading, both the article by James O Coplien and the discussion on this page.

    I must say, that after my well over 10 years of development experience in different areas, I strongly agree with James’s points. Not the exact wording, but the meaning of them. It’s obvious, that every point that he makes has exception in real world (I don’t think he is even denying it).

    It’s actually good, that James made that obviously “wrong” statement of “Throw away tests that haven’t failed in a year.” People who actual read and think about the article, understand, that this is just a bad example (clearly, time is not relevant but the number of times that the test is run – and this is also subjective). This example clearly separates the people who just read the words (or the summary below) from the people, who actually read and understand, what is really tried to say 🙂

  25. My problem is not with unit testing. My problem is with the elaborate software designs you have to have in order to perform unit testing. Mock objects, dependency injection, tons of layers, tons of interfaces that are only because of unit testing frameworks requiring them, etc. I shouldn’t have to create the golden gate bridge just to walk over a puddle of water.

    • I agee completely. Testing algorithmic logic doesn’t require any mocking at all. If the testing requires too much mocking etc, it is probably better to use end-to-end testing.

  26. Henrik: I found it funny that you disagree with Jim (“most unit testing is waste”), but still have it in your own response too: “WHEN TO UNIT TEST”.

    You have been around a long time, so you definitely know the buzz about “code coverage”, mocking etc. And then you come and say that unit testing is useful mostly for algorithmic code. I don’t know what kind of projects you have been working on, but sadly most big projects I have been working on have a huge amount of “non-algorithmic” methods and functions, and really low number of “algorithmic” mehtods and functions.

    Let’s say we wanted to get that “100% code coverage” (which is just silly) using unit level tests. For that, we would need a huge amount of silly unit test testing trivial functions with “coordinating nature”. In addition to that, we would need some amount of tests for actual “algorithmic methods”.

    At least in that case, you would agree yourself, that “most unit testing is waste”.

    Of course, there are a lots of problems with a code base that’s consisted with a huge amount of silly coordinating code, wraps, mocks and whatnot. Partially the original reason for that sort of architecture we have was that some parts were designed for testability…

    However, most of our “algorithmic methods” are not algorithms itself, but just a fractions of algorithms. Again, it would be probably waste to test those fractions, instead of integration level testing or end-to-end tests. In these cases, I don’t think that unit level testing would be that beneficial.

    I personally believe more in integration testing and end-to-end testing than in unit testing. With e2e tests you might refactor your whole code base without changing your tests, as long as your use cases are still working as they should.

    Maybe the problem here is that you interpret Cope’s “most unit testing is waste” as “all unit testing is waste, and produce no value”, which I can’t find myself from his article. While most unit testing (in general) might very well be waste, it doesn’t have to mean that most unit testing made by you (in particular) is waste.

    • Hi Risto,

      Thanks for commenting, and for disagreeing without shouting 🙂

      Yes, I agree completely that creating unit tests only to get to 100% code coverage is silly. As for design for testability – it depends on what you mean. I think it is very useful, but only when the resulting unit tests are not ”silly”. Fractions of algorithms are still worth unit testing in my opinion.

      You make a very good point that most unit testing (not just my unit testing, but as practiced by everyone) may indeed be waste. However, the reason I wrote my post was that I thought James argued for fewer unit tests than I think are useful. I tried to give concrete examples of when unit testing is useful: it is helpful to have well-tested parts, it gives cleaner separation of the parts, it gives faster feedback and makes it easier to set up the test context. I also thought that several of his arguments against unit testing didn’t really make sense.

      But in the end, it’s a matter of degrees. How much unit testing is useful? I had hoped that this post would be a starting point for a discussion on how much of it is useful. Unfortunately, the discussion pretty quickly turned into a shouting instead.

  27. Matthew Marcus

    “This is true, but it applies equally to integration testing, and is thus not an argument against unit testing.” — https://yourlogicalfallacyis.com/tu-quoque

    • For a non-trivial program, you will never be able to completely test it. But in my opinion that is not a reason to avoid integration testing, and it is not a reason to avoid unit testing either. What is your opinion on that Matthew?

  28. I was only pointing out the lack of logic in your argument. I also find your assertion that the problem applies “equally” to be inaccurate. I think Mr. Coplien’s first paper does a great job of making a statistical proof for that.

    Personally, I agree w/ Mr. Coplien’s assertions, and I whole-heartedly think you can get much closer to full logic-pathway coverage w/ end-to-end, or integration testing, and w/ less duplication of tests, and with less rigidity, than you ever will w/ unit-testing.

    Personally, I have yet to be “saved” from a regression through unit-testing (although I’ve spent a multitude of wasted time investing in them)… On the other hand, I can’t count the times E2E/integration testing has found regressions before a release of code to the wild.

    • Thanks for your reply Matthew. I still don’t think pointing out that the state-space is enormous means that unit testing is waste. Also, it is not a question of integration tests *or* unit tests. Even with unit tests you still need integration tests. I am not sure exactly what he proved, but I don’t think it therefore proves that unit tests are waste.

      If you don’t get any value from unit testing, you obviously shouldn’t use them. For me they provide a lot of value, so I use them.

      • Henrik:

        I think he tried to prove that MOST unit tests might be waste. You kind of agreed with that with your “when to unit test” chapter.

        The enormous state-space renders “code coverage” — a false metric often associated with unit testing in general and TDD in particular — completely useless, even harmful. Also that is the reason why architecture, clean code, best practices etc. are MORE crucial to the quality of code than automated regression testing can ever be — and that doesn’t mean that ALL automated regression testing is waste.

        The integration testing or e2e testing vs. unit testing thing is about getting more bang for the buck. Every single test costs you, and the lower level the tests are, the more they cost (to maintain) especially in case of refactoring the code. It’s more about being smart about what to test than embracing the attitude to not test. While most unit testing might be waste, some unit testing might be crucial. And maybe if you are smart with your unit tests, maybe most of your unit tests are important, too!

  29. Hi Risto,

    Yes, you are probably right, bringing up the enormous state-space was likely in response to code-coverage issues. It threw me off a bit, since no testing will ever cover every possible scenario. I agree with you. The examples I listed were all cases where I think the value of the unit tests outweigh the cost of keeping and maintaining them.

  30. This post is optional reading for the Berkeley course CS 61B Data Structures, Spring 2017: http://datastructur.es/sp17/

  31. Dmitri Sirobokov

    I read original article a few times and disagree with a lot of arguments. Basically, well-written unit test over well written code should not be a pain but a benefit over integration test. And most people writing such blogs are referring to a poorly designed code.

    Google suggests 70/20/10 split: 70% unit tests, 20% integration tests, and 10% end-to-end tests

    https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s