TDD has had a rollercoaster ride since Kent Beck first coined the term in 2002. This is nothing to wonder about – TDD really has the power to be a significant power for change for a team of developers.
There are basically two camps, who don’t agree 100% on what the last “D” is. Kent Beck originally called it “Test Driven Design”, making the tests an integral part of the design process. More than half the people I talk to today (counting myself) is having a problem not calling it “Test Driven Development”, where the tests are not as strong a design driver.
Today, I am going to write about the “tests provide good design” camp. I am in good company, too. Uncle Bob and James Coplien have both debated this, along with many, many others. I am now going to weigh in on this, too. What arrogance. Oh, well. Here goes:
TDD, in the original meaning of the acronym, is a technique where tests are written first. Since the test is new, it will fail. The minimum amount of code that will make the test run is then written, and the process is repeated. At some point, the developer may decide to clean the code up, in which case the tests must run successfully both before and after the cleanup. This refactoring of the code gradually improves the quality and the architecture of the solution.
Testing the wrong thing
A couple of weeks ago I was told that a team had testing well in hand. They had 97% test coverage. I immediately became very, very skeptical since I know the bug count for this team.
We developers are techies. This is what enables us to create very good applications. It is also one of our biggest blind sides. We tend to treat everything as a technological problem. If this is transferred to testing, we get nonsensical tests with names like “testConstructor”.
This is a really, really big problem. Your customer is never going to care about your constructors. What he cares about is the behavior he expects from the solution. This kind of problem was described by Dan North when he coined the term BDD, where the tests are explicitly modeled on customer-oriented statements. So – tests should test something the customer cares about, not the implementation. We may want to change the implementation in the future.
Your tests should be connected to your business model
A side effect of the above statement is that you, if you are using DDD (Domain-driven design, are going to have tests that nestle up to your domain model very intimately. Your customer is going to use some nouns, those nouns are going to become parts of the model, and your tests will probably express themselves using the same nouns. It is imperative that your tests can handle changes to the “domain language” your customer is using.
The weakness of refactoring
Refactoring is a process where you improve your system by getting your tests running right before and after you make a nonfunctional improvement. The first book I saw on the subject was by Martin Fowler. I recommend this book heartily to anyone who hasn’t heard of the term before. Since TDD makes heavy use of refactoring, it is important to understand the limitations of refactoring.
I really hate the “refactoring” menus in IDEs. This is not refactoring, in the meaning Fowler used. These are automatic transformations of your code, and disconnected from your test cases. This tool is on a lower level than refactoring is.
Refactoring can only help you make changes in situations where you have tests that cover the behavior that is affected by the code you want to touch. There are lots of implications to this simple statement, but the most important one is that you cannot change the interfaces your tests use using refactoring. You would need to change your tests too, which would place the operation you are undertaking squarely outside of the “refactoring” paradigm. Tool support may help reduce the risk of the change, but the change is not a refactoring operation.
I know lots of very competent developers who are trying for tests that have sub-second runtimes, which means they are usually unit tests where large parts of the application has been disabled. The danger inherent in this approach is that your test boundaries should be very carefully chosen – and should not evolve significantly. A sub-second test runtime is nice, but is not a requirement the customer necessarily cares about if it costs him significantly along some other dimension.
Creating APIs is something developers think of as fun – and it is. This does not mean your application should be littered with hundreds of shoddily designed APIs. Keep the interfaces you expose (to tests or other clients) to a minimum. This is necessary since these bits cannot be allowed to change very often.
I think tests should be running against a system that is as close to the finished architecture as possible, and test against fairly large functional areas. These areas will become your “published” APIs.
Evolution of architecture
TDD in its pure form is an evolutionary process where the developer decides which code structure is the “fittest”. He uses the microscope created by his test suite to do this.
As I have already mentioned here, I have just finished reading “The greatest show on earth” by Richard Dawkins. He devotes a whole chapter to the design shortcomings of evolution. In his case it is a rebuttal of creationist theories, but I am going to pinch his argument here to point out the known problems to trying to evolve an architecture.
On page 360, I found the following paragraph:
During the evolution of the mammals, however, the neck stretched (fish don’t have necks) and the gills disappeared, some of them turning into useful things such as the thyroid and parathyroid glands, and the various other bits and pieces that combine to form the larynx. Those other useful things, including the parts of the larynx, received the blood supply and their nerve connections from the evolutionary descendants of the blood vessels and the nerves that, once upon a time, served the gills in orderly sequence. As the ancestors of mammals evolved further and further away from their fish ancestors, nerves and blood vessels found themselves pulled and stretched in puzzling directions, which distorted their spatial relations one to another. The vertebrate chest and neck became a mess, unlike the tidy symmetrical, serial repetitiveness of fish gills. And the recurrent laryngeal nerve became more than ordinarily exaggerated casualties of this distortion.
Again – go read this book!
So – the recurrent laryngeal nerve. What am I on about? Well, this is a nerve that is routed from the brain, down into the chest to hook around a major blood vessel, then to return to the larynx. An engineer would lop it off and reroute it to save several metres of nerve in the case of a giraffe, but this is something that we cannot completely trust evolutionary strategies to do.
Ok, what is the solution?
I believe in self-organizing teams where there are incentives to choose the best designer to go off and design the APIs that need to be good from the beginning. I believe in using tests to help get the rest of the design good enough.
This won’t help much, though. Some teams are not heavy on design expertise, or worse yet, may think they are. I have had colleagues with two years of work experience refer to their extensive experience while talking to people with double-digit career spans. I am sure I did it myself ten years ago, and I don’t think there is a single developer out there who is not overestimating his knowledge in one or more areas. Teams who could use more design expertise should do their design up front, or at least to find the parts of the solution which needs to change.
Other teams are able to get things right straight away because most of the teams are experienced designers. If your team has multiple Kent Becks on board, the best thing for an architect to do is probably to shut up and let them get on with it.
There is absolutely nothing I can say from this end to help find this balance, since the balance is team specific. But if you are a person who in some way or other has an interest in the ability of a team to deliver change and quality fast, I suggest you make sure they get a steady stream of domain model changes – and measure them on their ability to get things into production. That way, you will always know what to expect in terms of speed of change.
If they get it wrong – let them stop and sort it out.
If they suggest not releasing to production as often (it is too costly!), get outside help! Your team is in effect telling you they can’t cope, and the problem may be you.