Sunday, May 9, 2010

Non-regression tests: camming devices for humble programmers

More than once, have I been surprised by the scarcity, or even lack, of tests in both proprietary and open source software. After years of coding, I learned maybe only one thing: we inescapably make mistakes. For some reason, somehow, a certain amount of bugs ends up in our code:
  • maybe the specification was fuzzy,
  • or some library did not behave as documented,
  • or a corner case could hardly be anticipated...
Mistakes are simply unavoidable. But, isn't it just plain stupid to make the same mistake twice? Hopefully, using non-regression tests ensure that any bug previously found and fixed is permanently removed. Similarly to camming devices in climbing, they save your progress and sometimes... your life.


Without or with non-regression tests?

In this post, I explain on a concrete example how to write non-regression tests, how to add them and when. All examples are written in C#, a language I am particularly fond of. Hopefully, readers knowledgeable in any imperative language such as C, C++ or Java should have no major difficulties understanding the code. I am using SharpDevelop, an open source IDE for C# and NUnit, an open source testing framework for any .NET language, including C#. NUnit is integrated by default in SharpDevelop. Alternatively, it can also run as a standalone program.

As a starting point, let us consider a simple, hypothetical coding situation. We are working on a word manipulation library. The library exports a unique namespace WordManipulations, which contains a class Sentence. This class wraps a string and provides some simple text manipulation operations. The first step consists in adding a new project, called Tests, dedicated to non-regression tests. It references both the dynamic linked library (DLL) nunit.framework and the project WordManipulations. We then create a class Suite00_Sentence in order to group all tests related to class Sentence. We place the attribute [TestFixture] to signal to NUnit that it is a test suite. Our first test method will be called Test000_CreationFromString and is similarly tagged by the NUnit attribute [Test]:
using NUnit.Framework;
using WordManipulations;
namespace Tests
{
[TestFixture]
public class Suite00_Sentence
{
[Test]
public void Test000_CreationFromString()
{
}
}
}
At this point, the organisation of the whole solution should look like this:
  • WordManipulations
    • Sentence
  • Tests
    • Suite00_Sentence
      • Test000_CreationFromString

Let us now check everything compiles nicely and our first test executes correctly. In SharpDevelop, simply click on the play button of the unit testing side-panel. If you like it better, you can also directly use the NUnit interface. To do so, first compile the project Tests. Then go to directory Tests/bin/Debug in the Windows file explorer and double-click on file Tests.dll. This should invoke NUnit. From there, simply click Run. In both cases, a green light should indicate the test suite ran successfully.

For the next step, let us fill in the body of Test000_CreationFromString. It specifies that an instance of class Sentence can be built from a string:
public void Test000_CreationFromString()
{
Sentence empty = new Sentence("");
}
For this test to succeed, we implement the corresponding constructor in class Sentence as follows:
public class Sentence
{
private string content;

public Sentence(string content)
{
this.content = content;
}
}
Let us go a bit further. Say we want a method LastWord to return the index of the last word in a sentence. An additional test simulates the most simple use case of LastWord. Namely, when the input is "a b", then the expected output should be 2:
public void Test001_LastWord()
{
Sentence input = new Sentence("a b");
int result = input.LastWord();
Assert.AreEqual(2, result);
}
Class Assert from the NUnit framework provides several ways to check the expected output of tests.

An implementation of LastWord that validates this test could be for instance:
public int LastWord()
{
int i = this.content.Length - 1;
char currentChar = this.content[i];
while (currentChar != ' ')
{
i--;
currentChar = this.content[i];
}
return (i + 1);
}
However, soon somebody named Mary Poppins reports an array out of bounds on "Supercalifragilisticexpialidocious". We immediately write a non-regression test that reproduces the bug:
public void
Test002_LastWordShouldNotFailOnSingleWordSentence()
{
Sentence input
= new Sentence("Supercalifragilisticexpialidocious");
input.LastWord();
}
Before starting a long debugging session, we simplify the test to the bone. It turns out that the current version of LastWord even fails on an empty string. So we modify our test accordingly:
public void
Test002_LastWordShouldNotFailOnEmptySentence()
{
Sentence input = new Sentence("");
input.LastWord();
}
To pass this test, we rewrite LastWord:
public int LastWord()
{
int i = this.content.Length;
do
{
i--;
} while ((i > 0) && (this.content[i] != ' '));
}
We then run all the tests written so far. They all succeed, so we can go back to the initial bug report. The method seems not to raise any exception anymore on a single word sentence. However, it does not seem to return the correct value either. So we add another test:
[Test]
public void Test003_LastWordOnSingleWordSentence()
{
Sentence input = new Sentence("a");
int result = input.LastWord();
Assert.AreEqual(0, result);
}
We modify our code once more, and hopefully last, time:
public int LastWord()
{
for (int i = this.content.Length - 1; i > 0; i--)
{
if (this.content[i] == ' ') return (i + 1);
}
return 0;
}
In the end, we produced four test cases and a much clearer source code than the one we started with. As you may have guessed, I am particularly fond of early returns. I actually believe, there is still (at least) one bug. Can you find it?

Let me draw some general rules from this example and lay out the methodology of non-regression testing. A non-regression test is a piece of code that checks for the absence of a particular bug. It should fail in the presence of the bug and succeed in its absence. Tests have various purposes:
  • robustness tests check the program does not stop abruptly,
  • functional tests check the program computes the expected value,
  • performance tests check the program runs within its allocated time and memory budget.
A test is usually composed of three main phases:
  • a prelude sets up the context and/or prepares the input,
  • an action triggers the bug,
  • an assertion checks the expected behaviour.
In many cases, the prelude or assertion may not be necessary. In particular robustness tests obviously do not need any assertion. Tests should be made as concise and simple as possible. A collection of small tests is generally preferable to one large tests crowded with assertions. Tests should be independent from each other and could theoretically be run in any order. However, ordering tests by their date of introduction documents the software building process. Earlier tests tend to break less often as the software matures. In addition, reordering tests from simplest to most complex greatly speeds up later debugging. By the same logic, any test that fails, should be duplicated and reduced to the smallest prefix that breaks before debugging. Since NUnit runs tests in alphabetical order, these principles lead to the following organization and naming conventions:
  • Test suites should be named after the class they target and prefixed by "SuiteXX_", where XX stands for a two digits number,
  • Test methods should be named after the behaviour they check and prefixed by "TestXXX_", where XXX stands for a three digits number.
There are various occasions to add tests. The most important rule is to add at least one test every time a new bug is found. Writing tests before coding is also strongly recommended. In a way these tests work as executable specifications. Among others, they help:
  • evaluate the quality of classes external interfaces,
  • choose the rĂ´le of each method,
  • explore scenarios previously identified during design.
In constrast to textual documentation, which may not be renewed as fast as the code changes, executable documentation of this kind remains always true. A few other occasions I can think off the top of my head are :
  • during code refactoring, you fear to break some invariant, first write some tests,
  • code learning/review: you must work on a particularly obsure piece of software, for every bit of understanding you painfully acquire, write some tests,
  • code quality ramp up: use the report of a code coverage tool to write some tests.
That said, writing numerous redundant tests out of thin air in order to reach some tests count target is simply useless. Coverage of real situations will still be low.

When following a non-regression methodology, every bug becomes the occasion to improve both specification and code quality... forever. Thus, after a while, one begins to appreciate bugs for their true worth. In addition to this profound psychological twist, non-regression testing has other impacts on the overall development process. Since at all times, the project runs without breaking its past behaviour, coding becomes a matter of incremental refinement. Project risk is managed almost to the point that the software may be wrapped and shipped any minute.


Centre Pompidou, and it is not a Fluid Catalytic Cracking Unit

On the other hand, non-regression testing also imprints its particular coding style. A bit like structural expressionist buildings, software produced with this methodology tend to expose their internal structure. In fact, to test the lower layer of a software, it is often necessary to be able to call methods, trigger events and check the value of fields that are private. In order to distinguish these probes from standard methods, I respectively name them TestRunMethodName, TestTriggerEventName and TestGetFieldName. However, the visibility of many classes, which should ideally be internal, needs to be elevated to public in order to become testable. This tends to burden a namespace unnecessarily. If you have an elegant solution to this problem, I would really like to hear about it!

To summarize this post:
  • either during feature addition or debugging, tests should be written before coding,
  • tests should be as simple and concise as possible,
  • as I like to repeat ad nauseam to my team members:
1 bug => 1 test

In some later post, I plan to talk about increasing tests coverage with Partcover, explain how to systematically run non-regression tests before any change to a central Mercurial repository and debug the myth of untestable graphical interfaces.

My questions for you today are:
  • Do you do non-regression testing?
  • What are your practices like?
  • Which unit testing framework do you use?
  • Do you have any idea how to test classes without making them public?