The Three Laws of TDD Revisited
Here are Robert Martin's three laws of TDD written from a BDD perspective:
  1. Write no production code without a failing Example
  2. Write only enough of an Example sufficient for it to fail (not compiling is failing)
  3. Write only enough production code to get the single failing Example to pass

These rules are not enough to effectively write code using either BDD or TDD. Here are a few more things to take into consideration:
  • Keep your code clean: as you notice unruly code, tame it. However, do this only when your Examples are all passing
  • Keep existing Examples passing
  • When you are looking at code, apply a critical eye and look for violations of sound design principles. (What are those principles? The tutorial will describe violations as they become ugly enough to warrant taking your time to discuss them.)


Examples passing without validation
Generally, we want Examples to fail for an expected reason before they pass. This case is an exception to that rule, but you'll see that coming up.


All Green
All green refers to the state of the executing Example; it means all Examples are executing and passing. This terminology derives from unit testing in Smalltalk and then popularized by JUnit. In Smalltalk, their process was:
  • Red: Create a failing test
  • Green: Get the failing test to pass
  • Blue: Refactor

This is essentially what this tutorial advocates. JUnit turned this into red-bar, green-bar. When the GUI of JUnit executes, it has a progress bar that indicates either:
  • Red: one more more tests have failed
  • Green: all tests executed so far have passed


A Concrete Example of a Hardware Simulator
In mid 2008, I worked with a great group of people on the East Coast. My task was to use their current problem as source material and to take them through TDD (not BDD) on their problem.

They already had a start on the problem and had pretty good source material on what the system needed to do. The system essentially simulated expensive hardware. This system would make testing the software to drive the expensive hardware. The problem involved capturing an over-the-wire protocol, which they were already doing in C++. The problem, then, was to take that protocol and respond appropriately and even initiate messages.

During the week I was there, we:
  • Created an initially simple solution that evolved into something that nearly simulated a complete, albeit simple, "job".
  • We had fully tested the execution of the protocol, including starting the system up in simulated mode (staring a Java virtual Machine [JVM] directly) versus real mode, starting a JVM from a C++ application.

On Tuesday, the week after I left, they had managed to fix a few problems on protocol capture and had successfully processed a job. On Wednesday, they were successfully processing a number of simple jobs.

A few weeks later they were processing complex jobs and then a few weeks later they were processing up to 15 simultaneous complex jobs. Their simulator was fast enough that they had to slow it down.

In a very real sense, starting with a granule and letting the system grow through accretion is a viable way to build complex systems.


Why two steps instead of just one?
Why not just create a method and in the same step get your Example passing? In this trivial example you could and it would not be too risky. However, in general you should make sure your Examples fail for the right reason before getting them to pass. Why? It serves as a check that the Example you've written is doing what you think it is doing.

With proper IDE support, the intermediate failing step to the working step is seconds. Even with the environment I'm using, two terminal windows, the work is 10 - 15 seconds. So while this is "extra" work, it has a purpose.

Eventually you will see these tutorials skipping steps. But for now, the tutorial is keeping the rigor high.


What a silly implementation!
What about the hard-coded return value? Why not just write the code now, it's clear what needs to be done. This is an issue of balancing what you know and the risk that what you know is incorrect.

In this case, you could probably guess where this is going. However, as I write the tutorial, this is literally the first time I've worked on this part of this problem, so while I think I know what's going to happen, I'm really not sure. And even if I know the next step, I am not too sure about 2, 3 or more steps ahead.

A general answer to this complaint is that rather than writing what you know the solution should be, write an Example that will force/allow you to write the code you know should be there.

The primary danger of jumping ahead is writing too much production code. That code may not be well covered by your Examples, so you'll essentially have unvalidated code. In addition, you could write more than is necessary. If you do that, the next person to read the code might wonder why there's more code than the necessary. You might be that next person, by the way.


Failing Tests: Not Idle Chit-Chat
This is not idle speculation, I've been on projects where this dynamic occurred. One example involved a situation where "certain tests" (this is TDD, not BDD) just failed and people tended to not worry about them. A developer added a new key-value pair to a hash table. This caused a test to fail. He figured that the test was failing for some other reason, so he ignored it.

In fact here's what was happening:
  • The hashtable was in an object being written as a blob
  • The new key-value pair pushed the object just over the limit of the blob size (4076 bytes -- 5000 bytes, limit was 4096)
  • So his change was breaking the test, but not always because sometimes the data was just small enough to fit.

If tests fail for the wrong reasons, it makes it easy for someone to assume that what they just did could not possibly break the test. Often that just means an assumption is about to be exposed for what it is.

The DRY principle also covers test coverage.

So what can you do? How about accepting a decimal point in the number?


Open/Closed Principle
The Open/Closed Principle suggests that a class, once released into the wild, should not change again except for fixing errors. In Eiffel, this meant leaving a class alone and extending from it to add new behavior. In general, this can also mean depending on an abstraction (interface or abstract base class) and then adding subclasses to complete the behavior.

Why is this relevant? Gerald M. Weinberg describes a relationship between the size of a bug fix and the likelihood that the bug fix introduces new bugs. In his experience, the smaller the fix, the more likely a new bug will be introduced. In his research, 66% of all one-line bug fixes introduce new bugs.

So leaving working code alone is at the heart of the Open/Closed Principle.

Adding new methods to an existing class is in fact changing that class. Adding new methods to a class is less likely to cause problems, but it is not impossible break a class when you do add methods to it. This suggests adding methods to a class every time we add a new feature is probably a bad idea.

Another issue is, in general, the more methods on a class, the more likely it is that it will be hard to understand. Finally, having many methods on a class makes it more likely that the class will violate the Single Responsibility Principle.


Just how much can you follow the three laws
If I have a proper IDE and decent refactoring tools, then I really do prefer following the three laws strictly. Unfortunately, I have not as of 2008/10 found an IDE that has two particular refactorings/corrections for Ruby:
  • Create class missing at the cursor location (preferably in its own file)
  • Create method missing at the cursor location

I have these in C++, C#, Java, VB.Net and others. Since I don't have these, I'd consider relaxing the second rule of TDD to allow me to write an entire Example and then get add the missing Class/methods. I am not taking this approach in the tutorial because it doesn't add that much time and I'd rather you see how it is supposed to be practiced.

Is this realistic? Yes. Many developers better than I have practiced the three laws. In the simulator mentioned above, during the week I was there, we strictly followed TDD. I was not always at the keyboard but the work was being projected on an 1080i overhead projector so that kept the developers "honest."


I did too much
When I wrote "the solution" to the "should cause next digit to reset the x_register" Example, I put this logic in my code. I then realized that it was doing nothing to support any Examples so I removed it.

I then decided to create a test, the one you just worked on, to justify the need to reset the variable. If I had been working with a pairing partner, I'd hope my co-pilot would have called me on violating the third rule of TDD/BDD.


Wrapping Collections
Do it!

I was tempted to just leave it at that, but I'll say a bit more. In my experience, most (anything beyond a trivial problem and most trivial problems) of the time I use a collection, at some point I'll want more behavior that the collection offers. The collection implements a "raw" zero-to-many relationship. Most of the time, my requirements want more than just a "raw" relationship. In the case of the RpnCalculator, it's stack is of infinite size and it always has zero after anything else that is pushed onto it. We simulated this by returning 0 if the stack is in fact empty.

It is nearly always a good idea to augment a system-defined collection with your own flair.

When you do, you have several options:
  • Subclass (in .Net you don't have this option because most methods are sealed and non-virtual)
  • Wrap and delegate (this is what we did). You always have this option
  • Open up the class (Ruby and other dynamic languages) - this is dangerous. Opening up a heavily used class like this is "too cool."
  • Open up an instance (shown above) - this is probably too cool as well.

The option that always works, though it might require a touch more work, is the second option and what you did in this tutorial.

If you every have a collection within another collection, it's immediately time to follow this recommendation.


Example styles and Your audience
Which do you prefer? You have seen three different forms:
  1. First, you saw Examples simply using the calculator.
  2. Second, you saw Examples using a combination of support methods and using the calculator.
  3. Now, you see an Example only using support methods.

The first form gives a concrete example of how the RpnCalculator should be used. The last form makes the Example easy to read and moves the writing of the examples into the way a user might speak about using the calculator.

Here is just he Basic Math Operators context updated:
  describe "Basic Math Operators" do
    it "should add the x_regiser and the top of the stack" do
      type 46
      press :enter
      press :+
      validate_x_register 92
    end
 
    it "should result in 0 when the calculator is empty" do
      press :+
      validate_x_register 0
    end
 
    it "should reset the x_register after +" do
      type 9
      press :+
      type 8
      validate_x_register 8
    end
 
    it "should should reduce the stack by one" do
      type 9
      press :enter
      type 8
      press :enter
      press :+
      @calculator.available_operands.should == 2
    end
 
    it "should subtract the first number entered from the second" do
      type 4
      press :enter
      type 9
      press :-
      validate_x_register -5
    end
  end
Here's a side-by-side comparison between the two styles:

it "programmer" do
  • @calculator.digit_pressed 4
  • @calculator.execute_function :enter
  • @calculator.digit_pressed 5
  • @calculator.digit_pressed 2
  • @calculator.execute_function :+
  • @calculator.x_register.should == 56
end
it "user" do
  • type 4
  • press :enter
  • type 52

  • press :+
  • validate_x_register 56
end


Before thinking that one style is the "right" style, consider the audience and even the author. If the audience is the user, then the style on the right is more appropriate. If the audience is another developer, then the left side might be more appropriate.


You might think that using the methods names on the right for your calculator is an option. However, that won't look correct:
  @calculator.type 4
  @calculator.press :enter
  @calculator.type 52
  @calculator.press :+
  @calculator.x_register.should == 56
The method names do not make sense being sent to a calculator. So the method names make sense on the left when sent to a calculator. The method names on the right make sense when reading an example.

While this tutorial will not cover it, RSpec has the notion of "story tests", or tests that are meant to be written by the same people who write user stories, your customer, product owner, QA person or even developers. That's coming up in a later tutorial, so this subject will come up again.


Ruby Files Use Spaces
According to the Pickaxe book, the Ruby community has generally settled on using spaces, not tabs for intention. In addition, the typical tab size is 2. So all of these examples will demonstrate this preference.


How Many Steps are Normal?
If you are not familiar with TDD/BDD, then you might think all of these small steps are crazy. Maybe they are, however since writing code is a learned behavior, any practice might be as crazy as the next. So please try to stick it out and keep working with these "small" steps. Actually, calling them small is even a value judgment. Small compared to what? What you have been doing in the past? Sure, but then maybe these steps are just the right size and you've been taking large steps?

One more thing before you go on, the code violates the DRY principle. DRY stands for Don't Repeat Yourself. The examples in the two contexts have duplication. Before moving on you'll perform some refactoring to remove duplication.


Why should you care about the examples?
Automated tests give us amazing leverage. If something is not quite right, hard to understand or inefficient, then you might need to change it. What would happen if you could change the implementation of something and know with near certainty that you did not break anything? What if your requirements change and force a redesign? Maybe you are just adding to an existing class. Did what you just do break it? How can you know? The answer is automated tests (or examples in this case). What if you need to build something from the ground up? If you happen to be rewriting something with existing examples, then you have an executable specification from which to work.

I have had several such experiences. I wrote the obligatory login service for a single application. I wrote several unit tests (not even following TDD, but rather I was test infected) to make sure I got all of the business rules right (there were maybe 70, give or take). After we deployed the first application, we used the first application as a prototype for a suite of applications. A suite of applications, of course, requires single sign-on. This required significant rewriting both because of new requirements as well as a need to extend the base architecture to support multiple applications. I had 70 working unit tests. I added around 40 more tests to accommodate the new requirements. Also, with the new system, the underlying implementation became a mess and strongly suggested refactoring to the state design pattern. While all of this was going on, I keep the unit tests passing. It make a daunting task much easier.

So unit tests are a valuable asset and should be treated with the same (maybe even more) TLC that you'd treat production code. Thus, keep your tests clean and well written as you work.


Checking in is slow
What if checking in takes too long - that is, you hesitate to check in frequently because you don't want to wait for it? Can you find a way to reorganize your repository? As a developer and consultant, I see a lot of examples of poorly configured repositories or just poorly written tools and it frustrates me when my tools slow me down.


What is Analysis?
Check out the definition of http://dictionary.reference.com/browse/analysis:analysis - synopsis: breaking something into its constituent parts. When you are thinking about adding a new feature, learning how to break it down, while requiring practice for most of us, is a valuable skill to practice. Start with the assumption that it can be broken down, because it's nearly always possible. Typically you'll pick a partial solution that, by itself, does not complete anything. In fact, that's what this entire tutorial has been demonstrating. Think about it in terms of questions. If you cannot figure out the whole thing, are there any questions you can answer that move you in the direction? Can you then state that question as a test, or experiment? Once you've written the experiment, unlike testing a hypothesis to check your model of reality, you define reality by writing production code to match your hypothesis. If you do not like the reality that you have created, change the definition of reality (your tests), and then alter your universe to match.


Did you just write too much production code?
You've written just enough to make everything pass, so no. Doing the simplest thing that can possible work does not mean you cannot write support methods. It is sometimes (often times) OK to follow existing form. However, the danger is you are copying what is already a bad form so you need to always use your most important refactoring tool - your brain.

When you do follow form, expect that those support methods will initially be simple, even empty (e.g., handle_paren).


Refactoring
A couple things about refactoring:
  • You did not change the behavior of the code. You only changed the structure of the solution.
  • You are defining "behavior" in terms of the examples. So as you did these changes, you broke none of the examples.
  • You took very small steps and ran the examples often. This way you knew right away if you broke anything. When someone who practices BDD says something pithy like "why have you not run your examples in the past 5 minutes", you can now better understand how they accomplish this.
  • For this to be possible it has to be easy and very quick to run the examples. For this example that will be the case. As you move into more complex problems, you'll have to revisit this.


Ruby And ()'s
Ruby is pretty generous in allowing you to leave off ()'s. This code could have been written without using () so far. The Ruby community generally leaves off parenthesis on methods taking no parameters. As for calling a method with or without parenthesis, that depends.


Why should you care about the tests?
Automated tests give us amazing leverage. If something is not quite right, hard to understand or inefficient, then you might need to change it. What would happen if you could change the implementation of something and know with near certainty that you did not break anything? What if your requirements change and force a redesign? Maybe you are just adding to an existing class. Did what you just do break it? How can you know? The answer is unit tests. What if you need to build something from the ground up? If you happen to be rewriting something with existing unit tests, then you have an executable specification from which to work.

I have had several such experiences. I wrote the obligatory login service for a single application. I wrote several unit tests to make sure I got all of the business rules right (there were maybe 70, give or take). After we deployed the first application, we used the first application as a prototype for a suite of applications. A suite of applications, of course, requires single sign-on. This required significant rewriting both because of new requirements as well as a need to extend the base architecture to support multiple applications. I had 70 working unit tests. I added around 40 more tests to accommodate the new requirements. Also, with the new system, the underlying implementation became a mess and strongly suggested refactoring to the state design pattern. While all of this was going on, I keep the unit tests passing. It make a daunting task much easier.

So unit tests are a valuable asset and should be treated with the same (maybe even more) TLC that you'd treat production code. Thus, keep your tests clean and well written as you work.