Thursday, July 12, 2012

A case against static


Introduction

Some people believe that all constructs of a programming language have their raison d'être. I am of the opposite school of thought:  I strive hard to write code using extremely reduced, simple and coherent subsets of the language. Because I know this practice is a major factor of code quality. If I was given the choice to remove only one keyword from the Java programming language, I would have no hesitation whatsoever. It would be the static keyword.

I will not dwell into a detailed explanation of the meaning of keyword static. Only that it comes in four flavours: static variables, static methods, static classes and static initialization blocks. Static variables are associated to the class in which they are declared (contrast this to standard fields which necessarily appear in an instance of the class). They can be roughly understood as global variables (with various degrees of visibility). Similarly static methods do not need any class instance to be called, but can not access non-static fields. Static (necessarily inner) classes are somewhat a different story which I prefer not to tell today. (In short, I despise inner classes and ban them in my coding standards). And at last, static initialization blocks are not really worth mentioning: they are simply bug nests to avoid at all costs.

First let me stress the fact that, from a theoretical point of view at least, any program can be written with no static variables nor methods at all (except for the entry point, which is by definition necessarily static). The proof is easy to sketch: group all static variables and methods as non-static members of one huge class. Initialize one instance of this class at the beginning of the program and then pass this instance all over the place for the other classes to use. 
Conversely, note that a Java program with only static variables and methods looks like a C program with modules, but without pointers, structures or unions. Pointers to structures can be somewhat emulated by classes without any methods. Hardcore C programmers must now be grinding their teeth: I am well aware Java does not allow for the power of low level operations such as pointer arithmetic or arbitrary casts (for example from structure to array of characters). Unions however can be recovered with some clever tricks using inheritance.

This brief language analysis underlines the fact that really two orthogonal ways of organizing code are competing within Java:
  • pure modular procedural programming (only static methods, and non-static variables in lightweight data objects),
  • pure dependency-injected object orientation (no static variables or methods).

In practice, most programs I have encountered are a mix of these two relatively incompatible paradigms. And it feels. So, I am going to make the case, that writing within the pure object oriented fragment of Java is more productive and leads to more robust code.

Problem statement

Let me start by examining some very small code snippets. Although simple, these examples are representative of the use of static methods in real-world programs.
The most extensive use of static methods is to make a service accessible from any place in the code:
Service.perform();
On the surface, nothing wrong here. However, the truth is that real programs are rarely that simple. In most concrete cases, simply calling a static method such as perform will simply not work. The method will throw some exception, because some internal structure needs to be initialized first. To do so, one needs to call yet another static method first:
Service.initialize();
Service.perform();
This is not the end of the story yet. Method initialize will usually be called once during the setup phase of the application (or maybe worse from another class's initialize method), whereas perform gets called several times, wherever it is required. The two methods, although logically bound, are thus syntactically far apart in the code:
// initialization phase
Service.initialize();
...
// application body
Service.perform();
You may think it is not a big deal. In fact, browsing the documentation of class Service may be sufficient to quickly understand its correct usage. (Even though documentation has an awkward tendency to age very quickly).
But, let us now go just one step further. Consider this piece of code:
Treatment treatment = new Treatment();
treatment.process();
There is a trap in this code. It is hidden from the eyes of the unaware external reader. What matters here, is that somewhere down in the body of method process, lies a call to the same static method Service.perform. This means that method process can be used only under the condition that Service is correctly initialized.
Service.initialize();
...
Treatment treatment = new Treatment();
treatment.process();
Now multiply this pattern many fold, spread it all over your application and you just got yourself a maintenance nightmare. Herein lies the core of the problem with using static: the introduction of hidden dependencies, also called hidden temporal coupling. So called temporal because Service.initialize must be called before and hidden because no indication is visible from the signature of either the constructor of object Treatment or its method process. As a side-note, the book Clean Code, A Handbook of Agile Software Craftsmanship lists hidden temporal coupling among its code smells.

Medusa by Gian Lorenzo Bernini, 1630
Do not let it petrify your code base!
 

Solution

Now contrast the previous code with the pure object oriented alternative. Instead of calling two static methods, one must first create an instance of the Service. Then method perform can be later called on this instance:
Service serviceProvider = new Service();
...
serviceProvider.perform();
The instance of Service constitutes a tangible proof that initialization correctly took place and a guarantee that method perform may be safely called.
Of course, now, the programmer must do the effort to propagate this instance everywhere he wants to use the service. For instance, the serviceProvider could be passed as a parameter of the Treatment constructor:
Treatment treatment = new Treatment(serviceProvider);
treatment.process();
The dependency is made clear by the signature of constructor Treatment. At first, this coding style may, for some, seem more demanding. However it is really more relaxing. There are no surprises. The programmer can safely rely on the signatures of the constructors and methods to ensure and document all dependencies.

Now that I presented you with the problem and its solution, let me list the multiple reasons why static and in particular hidden temporal coupling is rather bad for your health.

Perseus slaying Medusa
by Laurent-Honoré Marqueste, 1876

The several evils of static

Steep learning curve

Of all the problems related to static, this may seem, in theory, the least worrisome. However, in practice, it will probably consume developpers' time the most. Whenever a coder needs to work on a piece he did not write in the first place, he will have a hard time discovering the implicit dependencies. Unless code is very well documented (and documentation is kept up to date), he will probably need to ask someone more experienced which static methods must be magically called. He will not be able to rely on automatic completion either, since static methods are not carried by the data they operate on.

Costly component reuse

Calling static methods makes code reuse very costly. But that becomes noticeable only at the last minute, when actually trying to extract a piece of code and incorporating it elsewhere. You will have a hard time bringing the result to compile and execute correctly. Only then will it become clear, how heavily the displaced piece of code relied on hidden dependencies. Every single static call must be painstakingly tracked and replaced by some alternative provided in the new execution context. Another solution may seem faster to implement but is even less acceptable: add all the dependencies into the new context. To state it bluntly, highly coupled code can not be cheaply reused.

Increased risks of memory leaks

The deallocation of static variables (and everything that may transitively hold onto them) has to be explicitly taken care of. Programmers familiar with languages having garbage collectors (myself included), tend to easily forget this issue. In our defense, too many details must be handled:
  • to write a method which releases memory,
  • not to forget calling it everywhere it is needed.
On the opposite, the memory graph held by non-static variables simply frees when program execution leaves the scope. If the documentation is not accurate, programmers may be totally unaware that a static method allocates permanent memory. Which can easily lead to memory leaks. In this case again, the API lies!

Testability issues

Unless you still apply software development techniques from the previous century, you write unit tests. Static methods and variables make unit tests a whole harder to write:
  • dependencies must be discovered in order to correctly set the initial state,
  • memory not released in a test may impact the next one.
Worst of all, static methods do not provide seams. To perform unit tests, it is often the case that dependencies must be replaced by fake implementations. For instance, suppose you are testing a piece of code which emits orders to a printer:
Component c = new Component();
c.process();
Suppose then that method process performs the printing order with the static call sendPage:
void process() {
    Result result = this.doSomething();
    Page pageToPrint = this.presentResult(result);
    Printer.sendPage(pageToPrint);
}
You would rather not empty another printer ink cartridge every time the test suite is executed. However, there is no easy way to change the behavior of method sendPage for the duration of the test only. One way, which I clearly do not recommend, would be to add yet another static method setImplementation to class Printer. Then the test would go like this:
FakePrinter fakePrinter = new FakePrinter();
Printer.setImplementation(fakePrinter);
Component c = new Component();
c.process();
The much more straightforward solution is to have the constructor for Component depend explicitly on an interface of a printer, which may either be the real Printer (in production code) or a fake (in test code).
FakePrinter fakePrinter = new FakePrinter();
Component c = new Component(fakePrinter);
c.process();
Other examples in the same vein could include a logger whose state you would like to check, the queue of a thread runner which you would like not to fill, files which you would rather not create...

Some hard to track bugs

Static methods and variables are source of bugs of the hard kind. Let me simply illustrate with a real case I once stumbled upon. The application had a configuration service implemented with static methods. In order to retrieve the string value of a property, one would call:
String Configuration.getValue(String key);
The service also had an initialization method:
void Configuration.initialize(InputStream file);
Initialization would read all the configuration key-value pairs present in the input stream and fill a hash table. Calling getValue after initialization would return the property configured by the user. However, calling getValue before would return some default value (most of the time adequate but possibly different from the user's wish). Obviously method Configuration.getValue was called all over the place, even in the program initialization phase. So after some code refactoring, I had unknowingly moved a call to getValue method before the initialization phase. This bug was found very late because no regression test was done on this particular value, and everything seemed to work fine with the default value. It also took some time to pinpoint the root cause of the problem.
Without static, this problem can simply not arise:
Configuration configuration = new Configuration(InputStream file);
...
configuration.getValue("some-key");
Simply because one must first hold an instance of Configuration in order to be able to read some configuration value. And the configuration file is necessarily read when calling the constructor. This category of bugs is a classical consequence of hidden temporal coupling.
Similarly, memory leaks may also cause costly bugs, found late in the development cycle.

Architecture erosion

Static methods and variables are by their very nature global: they can be easily accessed from anywhere in the application. Pressed by time, developers may be tempted to use these handy static methods without paying their true cost upfront: carefully thinking about the overall architectural logic. Doing so, they introduce additional, hidden, dependencies. The application architecture quickly decays.

Unnecessary coupling

What you don't see, doesn't bother you... until it hurts you. Hidden coupling is bad, because it is hidden. So you won't spend time auditing dependencies and cleaning them up. With time, unnecessary coupling will undoubtedly increase without you even taking notice. So removing static methods should be a top priority. It will take time. You will discover unexpected, sometimes frightening links in your application. But at least, once dependencies are explicit, you can work on them: move them around, remove some, divide others... In the end you will get minimally coupled tight and focused pieces of code.

Static propagates static

At last, I am under the impression that static leads to more static. This may be caused by the fact that static methods can not call non-static methods or access instance variables. So when a developer needs to extend the behavior of a static method, he may feel stuck. Instead of trying to remove the static method, he may choose the path of least resistance by simply adding more static methods or fields. He will then gradually encounter more difficulties writing truly object-oriented code. For instance inheritance will not be possible. He will make more data public, losing encapsulation. He will write more code like this:
A.fill(b);
A.process(b);
A.print(b);
At this point, he ends up being trapped in a C style of programming where objects are used as passive data-structures.

Acceptable uses of static

For the sake of balance, there are, in theory, some harmless uses of static. They all obey two conditions:
  • no hidden temporal coupling,
  • no global mutable state. 
In other word stateless. Let me list all the examples that I can think of.

Constants

When final, static variables are acceptable. Strings, integers fall under this category. However, non literal final data-structures (such as hash tables) are not, since their content varies throughout the life of the application. Hand-crafted enums, implemented as several constant objects are also valid. Loggers may be admissible, even though static loggers become a problem as soon as you wish to mock them for testing purposes.

Fresh results

Static methods which return a new result every time are benign. They often provide alternative constructors. For instance the Matrix class may have a default constructor with only zeros as well as a static method identity to build a matrix with its diagonal filled with ones. Careful though, because singletons with global mutable state are only one step away. So, in this particular case I would rather have an instance of a MatrixFactory with a non-static buildIdentityMatrix method.
By the way, about singletons, Misko Hevery wrote a very well-thought and exhaustive piece underlining their dangers here.

Pure methods

Static methods which work only on the state carried by their arguments are also in theory non-lethal. A nice concrete example being the several assertion methods provided by the Junit framework (say for instance Assert.assertEquals). However, most of the time, having such methods is a sign of bad design. The method should be carried by the object it modifies. The only acceptable exception could be final (sealed in C#) objects, whose behavior can not be modified by inheritance.
But even in this case, I would either build a manager or encapsulate rather than add static methods. In C#, there is also always the solution of extension methods. But are they a good thing? For lack of experience with this language construct, I haven't made up my mind yet.

Program entry point

Whether you like it or not, you can't escape the fact that the program entry point in either C# or Java is a static method!

Head of Medusa
by Peter Paul Rubens, 1617
Let her rest in peace!

Conclusion

For the notable exception of the program entry point and literal constants, all uses of the static keyword should be banned. At first static methods and variables may seem convenient; especially for lazy programmers. But the cost is simply too high: hidden temporal coupling will rot away your program. The consequences range from a steep learning curve, decreased reusability, poor testability to rigid design. The alternative is to explicitly trace dependencies with object instances which are propagated through the constructors or methods parameters. In return, the signature of each class naturally documents all its dependencies.

1 comment:

  1. One more problem with static state: reentrency and thread-safety. It is possible to write thread safe and reentrant code with static state, but it gets pretty hairy.

    The libhdf5 is a good exemple of disaster due to static state. HDF5 is a kind of binary xml format, and libhdf5 is a reader/writer for this format, which provides an in-memory object model of elements in the file. Unfortunately, all objects are handled through ids, that are used by libhdf5 to lookup entries in static tables. This design makes it easy to write interfaces to this library for any language, but it means you cannot have two threads working on different files at the same time.

    When you control your entire application, you can create a static object to lock while using libhdf5, but if you assemble plugins of different origins, each using libhdf5, they will not share their lock...

    ReplyDelete