Saturday, March 31, 2012

On exceptional virtues

Some time ago, a friend of mine let me know about this post by Joel on software. I usually tend to agree with Joel's opinion. But I couldn't align with this piece which is strongly in favor of returning error codes rather than throwing exceptions. So I am going to propose the counter-point and explain why I'd rather use exceptions.

First, let me get this straight: even though I happen to know C quite well, on the contrary C++ is a foreign and feared programming language to me. So please keep in mind that what I have to say applies mostly to languages with garbage collection such as C# or Java. Languages must be understood as a whole and some traits can be incompatible. And so it goes with exceptions too.

I believe programs which use error codes tend to be more verbose and intricate. Consider these two programs:
try
{
    input = read();
    result = process(input);
    output(result)
}
catch (SomeException)
{
    report_error();
}
and
input = read();
if (input == null)
{
    report_error();
    return;
}
result = process(input);
if (result == null)
{
    report_error();
    return;
}
if (!output(result))
{
    report_error();
}
So which one would you rather have?
It can even get worse as half of the population of developers seem to favor functions with a single-exit point. The previous example then becomes:
input = read();
if (input != null)
{
    result = process(input);
    if (result != null)
    {
        if (!output(result))
        {
            report_error();
        }
    }
    else
    {
        report_error();
    }
}
else
{
    report_error();
}
We just made a 10 line function span over 20 lines. So is my brain under-performing? Or does using error codes increase the cognitive load? One thing is for certain. Avoiding exceptions forces you to violate two important programming principles:
  • Don't Repeat Yourself: because there are duplicate versions of the error handling code.
  • Single Responsibility Principle: since the logic of the program is interwoven with the management of exceptional cases.
Some C programmers overcome their inbred aversion for gotos in order to mimic exceptions and write code such as this:
  input = read();
  if (input == null) goto error;
  result = process(input);
  if (result == null) goto error;
  if (output(result)) goto end;
error:
  report_error();
end:
  return;
These programmers surely seem to agree with me.

Let us now suppose for a moment we still go for the lengthy version. After all, having more lines of code makes our productivity statistics look good, doesn't it? Now, we are faced with another problem: how are we going to pass some error code for functions which already return some value. We have several alternatives, each of them with its own set of unwanted consequences:
  • as we just saw in the previous example, we can use null to encode an error,
  • encode the error codes inside the return value,
  • or use an out parameter to store the return value of the method.
Obviously, I wouldn't advise using null, since forgetting a check for null most probably results into a null pointer exception to be raised later in the code. And you are back to square one. Additionally, the exact interpretation of null may sometimes be blurry. Let me give you a concrete example from the Java standard API. Interface Map from java.util has a method get which returns the value mapped to a given key, or null if the key is not mapped to any value. However, and herein lies the ambiguity, if the map permits null values, then null may either represent a mapping or no mapping at all. It is then advised to use method containsKey to distinguish between these cases. What an insecure API! It is safer for the method's user to either disallow null values, or adopt the solutions used by the C# class Dictionary: raise a KeyNotFoundException!
The C# Dictionary interface proposes an alternative way to fetch a value via the TryGetValue method. This method returns a boolean to indicate the presence of a mapping and an additional out parameter to pass the value. This practice has its benefit, which is to avoid the sometimes (but rarely) prohibitive cost of throwing and catching an exception. I will briefly come back to this point later in this post. However, in general, it is not a good practice to modify the state of a parameter of a method which has a return value. Indeed, I believe the separation of methods into either command or query as described in the book Clean Code: A Handbook of Agile Software Craftsmanship to be sane. A quote from this book goes:
Functions should either do something or answer something but not both. Either your function should change the state of an object, or it should return some information about that object.
Sometimes the error code needs to contain more information than just a boolean state. Some coders find it smart to encode the error state in the same type than the one used to store correct values. For instance, if a method normally returns positive integers, then negative integers could be used for error codes. This is a terrible practice which more than often leads the caller to not handling the error code at all.
Contrast these convoluted solutions with the simplicity of wrapping any error information into an exception. Plus you get the stack trace for free!

At last, error codes must necessarily be dealt with by the caller. Sometimes the caller does not have anything reasonable to do. And so she must bear the burden of propagating errors upwards. Not only that, but also, she must be careful not to forget handling any error code. Instead exceptions automatically climb the call stack until they are caught. In the worst case, all exceptions may end in a catch all at the top level of the application. Also, in Java, exceptions which do not inherit RuntimeException must be declared in the method signature. No more silly mistakes!

There may be only one reason to prefer error codes over exceptions. And that is performances issues. Even though using exceptions does not change the complexity of your algorithm, they are said to be very costly both in C# or Java. Well, let us put hard numbers on this belief.
I wrote two versions of a program which tries to retrieve several values from and empty dictionary. The first version throws exceptions:
Dictionary<int, int> dictionary = new Dictionary<int, int>();
int count = 0;
for (int i = 0; i < N; i++)
{
    int result;
    try
    {
        result = dictionary[i];
    }
    catch (KeyNotFoundException)
    {
        result = -1;
    }
    count += result;
}
Whereas the second program returns error codes:
Dictionary<int, int> dictionary = new Dictionary<int, int>();
int count = 0;
for (int i = 0; i < N; i++)
{
    int result;
    if (!dictionary.TryGetValue(i, out result))
    {
        result = -1;
    }
    count += result;
}
I ran both programs with the loop upper bound N successively equal to 1, 2, 4 and 8 million! Here are the durations of each run in milliseconds:

N (in million)1248
Error codes4678202343
Exceptions111774221208442057882788

Without any doubt, error codes win the speed test, being roughly 2500 times faster than throwing and catching exceptions. However, keep in mind we are experimenting with exceptions in the million. A quarter of an hour to process 8 million exceptions does not seem outrageous. So unless you are in a case where exceptions are both frequent and on a critical path of your program you shouldn't bother. As always, premature optimization is the root of all evil.

To conclude, I hope I convinced you that using exceptions lets you avoid some code duplication, enables separation of concerns, improves methods API and decreases the risk of forgetting error cases.

To sum up, here are the take-away lessons from this article:
  • always prefer exceptions to return codes,
  • in Java, try to avoid RuntimeExceptions,
  • optimize exceptions out only if really necessary.

Dali, The Temptation of St. Anthony
or exceptions propagating up the program?