Saturday, May 30, 2015

Economy of Means: On the Elimination of Inheritance (3/6)

From inheritance to decorator

This is one of the advanced questions I sometimes ask in interviews:
Suppose you have three classes A, B and C. Class A has a method f which performs some task. Classes B and C inherit A. Both classes override method f to add some extra behaviour to the base version. Class B outputs some log:
class B extends A {
    public override void f() {
        super.f(); 
        log("f was called");
    }
}
, while class C increments some counter:
class C extends A {
    private int counter;

    public C() {
        this.counter = 0;
    }

    override public override void f() {
        super.f();
        this.counter++;
    }
}
Now, you would like to instantiate a class which mixes all behaviours together: the base behaviour, the log and the counter increment. How would you do it? I am expecting a clean design, which can easily evolve in the future if needed.

Most candidates will struggle, yet the core idea is again to use a combination of interface and composition in replacement of inheritance. Through a gradual refactoring, we flatten the class hierarchy and introduce the decorator pattern.

First let us define interface I, which declares method f:
interface I {
    void f();
}
Then, we can have all classes implement interface I. Next, we can modify class B so that, instead of inheriting class A, it expects an object with interface I as argument of its constructor. In its implementation of method f, class B calls the base version of method f from this interface rather than from its super-class:
class B implements I {
    private I i;

    B(I i) {
        this.i = i;
    }

    void f() {
        this.i.f(); 
        log("f was called");
    }
}
We do the same with class C:
class C implements I {
    private I i;
    private int counter;

    public C(I i) {
        this.i = i;
        this.counter = 0;
    }

    public void f() {
        this.i.f();
        this.counter++;
    }
}
 
Classes B and C are now both decorators: if we view classes as API transformers, then we can see they both expect an interface I as input, and they both build instances which provide the same interface I, yet adding a layer of behaviour in the middle.

Peeling the onion

Thanks to the decorator pattern, getting any combination we could wish for is extremely easy:
  • new A() provides only f,
  • new B(new A()) provides f and logging,
  • new C(new A()) provides f and statistics,
  • new B(new C(new A())) or new C(new B(new A())) provide f, logging and statistics. 
This design is extensible: one can easily code an additional decorator. It will assemble seamlessly with the other decorators during instantiation.
At last, as was already shown in the previous recipe example, we could, if necessary, implement a factory that offers shortcuts to rapidly instantiate a variety of classes, all implementing interface I.


Avoiding override

As previously mentioned here, overridden methods are sometimes a form of dead code. Indeed, writing some code in the base class only to scratch it in the children, is not very elegant. Hopefully, keyword override can be replaced by the use of interfaces. Here is how.
Let us start again from our canonical hierarchy: class A father of classes B and C. Class A defines a method f, which is overridden in C (but not in B). All other things being equals, it is preferable to create an interface I which defines method f and have class A accept this interface as one of its constructor argument.
Two additional classes which both implement interface I are created out of the definitions of f present in A and C. By splitting our implementation into smaller components, we gained flexibility:
  • We can assemble a class B with the version of method f which was originally present in C, mixed in.
  • Interface I (possibly after some renaming) can now be reused outside of its original context.

Sunday, May 17, 2015

Economy of Means: On the Elimination of Inheritance (2/6)

The path away from inheritance

This section shows more complex designs that are based on inheritance. In each case, I explain how it is possible to avoid inheritance.

Default behaviour

Writing some generic engine that drives plugins is a powerful way to factor code and speed up development. Consider an hypothetical case where plugins would have to comply with the following interface:
interface IEngineEventsListener {
    Statistics processStartup(Car);
    void processAcceleration(Car, int);
    void processDecelaration(Car, int);
    void processStop(Car);
    void processCrash(Car, Crash);
    // etc...
    void processEvent20(Driver driver);
}
As you can see, the interface is quite rich. And it can be pretty tedious to provide a full plugin implementation. Now suppose all actions are optional. So it has been decided to add a class which implements standard default behaviours for every method:
class DefaultEngineEventsListener
                    implements IEngineEventsListener {
    public Statistics processStartup(CarData) {
        /* return some default value */
    }
    public void processAcceleration(CarData, int) {
        /* do nothing */
    }
    // etc...
}
Plugin developers can now inherit from this class and override only the methods for which a specific behaviour is required.
To me, this pattern is acceptable only as long as the default behaviour is empty and there are no method with a return value. Otherwise, overridden methods and default values are a form of dead code. This goes against one of the minimalistic rules:
Never ship code which will not be executed in the production environment.
In practice, actual designs generally tend to belong to one of these two categories:
  • most plugins implement only one method, in which case methods can be registered independently,
  • most plugins implement most methods, in which case cost of implementing the whole interface every time is acceptable.
Additionally, the variety of methods can be reduced by having their signatures converge. Taking this idea to the extreme, one could substitute all methods by only one by merging all data under a common interface (the union type as described previously in section A matter of choice). But then, of course, the burden of type dispatch would shift onto the plugin implementer.
As a last note, less strongly typed programming languages such as python or javascript simply allow partial implementations. Also, the syntax to register individual methods (or closures) tends to be lighter.
 

 

Antagonism between class hierarchy and methods call chain

Let us do some cooking. A WitchSoup is a Recipe. Base class Recipe is abstract since, during initialization, it calls abstract method getAdditionalTasks:
abstract class Recipe {

    private List<Task> _tasks;

    public Recipe() {
        this._tasks = new ArrayList<Task>();
        this._tasks.add(new PrepareUstensils());
        this._tasks.addAll(getAdditionalTasks());
        this._tasks.add(new CleanKitchen());
    }

    abstract List<Task> getAdditionalTasks();

    public execute() {
        for (Task task in this._tasks) {
            task.executes();
        }
    }
}

class WitchSoup extends Recipe {

    public WitchSoup() {
        super();
    }

    List<Task> getAdditionalTasks() {
        List<Task> additionalTasks = new ArrayList<Task>();
        additionalTasks.add(new MeltButter());
        additionalTasks.add(new MashBeans());        
        additionalTasks.add(new Boil());
        return additionalTasks;
    }
}
For some of you, this code may seem awkward. Indeed, it is wrong in more than one way. First, every new recipe will result in the creation of one more class. This leads to extreme verbosity. Second, class Recipe is abstract so it can not be instantiated: it has zero value on its own.
Last and worse, a loop has now been introduced into the classes dependencies. Simple inheritance creates a dependence from a class to its parent. An abstract class also depends on its descendants. Hence, method getAdditionalTasks can be viewed as a callback in the interface which separates WitchSoup from Recipe. As you may have experienced when writing unit tests with mocks, interfaces with callbacks are always harder to reason with. Classes WitchSoup and Recipe are actually very tightly coupled and cannot any longer be thought of as separate entities.

Unfortunately, I sometimes find this pattern in real (admittedly poor) production code (often buried down under a few more obfuscation layers).
Here is how this mess could be cleaned up:
class Recipe {

    private List<Task> _tasks;

    public Recipe(List<Task> additionalTasks) {
        this._tasks = new ArrayList<Task>();
        this._tasks.add(new PrepareUstensils());
        this._tasks.addAll(additionalTasks);
        this._tasks.add(new CleanKitchen());
    }

    public execute() {
        for (Task task in this._tasks) {
            task.executes();
        }
    }
}

class RecipeBook {

    Recipe retrieveWitchSoupRecipe() {
        List<Task> additionalTasks = new ArrayList<Task>();
        additionalTasks.add(new MeltButter());
        additionalTasks.add(new MashBeans()); 
        additionalTasks.add(new Boil());
        return new Recipe(additionalTasks);
    }
}
Abstract method getAdditionalTasks was simply replaced by a standard parameter of the Recipe's constructor. As class Recipe is now concrete, the need for WitchSoup or any other specific Recipe descendant disappears. Each recipe is replaced by a different creation method belonging to the factory class RecipeBook. The class hierarchy has flattened dramatically.

Flavor network: graph of ingredients as paired in recipe


From template method to strategy

Let us deepen our understanding of keyword abstract by playing with the template method pattern. In the following code, method accumulate combines all the integers present in its list parameter. It calls the abstract method combine to process integers pairwise. Any combinator can be chosen by implementing various subclasses. For instance, class MaximumAccumulator combines the integers by choosing the maximum, while SumAccumulator performs the addition.
abstract class Accumulator {
    public int accumulate(List<Integer> input) {
        int result = 0;
        for (Integer element in input) {
            combine(result, input);
        }
        return result;
    }

    public abstract int combine(int element1, int element2);
}

class MaximumAccumulator extends Accumulator {
    public int combine(int element1, int element2) {
        if (element1 < element2) return element2;
        return element1;
    }
}

class SumAccumulator extends Accumulator {
    public int combine(int element1, int element2) {
        return element1 + element2;
    }
}
Using pattern strategy we can replace inheritance by both an interface and composition. First we introduce interface Combinator:
interface Combinator {
    int combine(int element1, int element2);
}
Then, the Accumulator class can accept a Combinator as a constructor argument instead of expecting subclasses. The class loses its abstract status:
class Accumulator {

    private Combinator combinator;

    public Accumulator(Combinator combinator) {
        this.combinator = combinator;
    }

    public int accumulate(List<Integer> input) {
        int result = 0;
        for (Integer element in input) {
            this.combinator.combine(result, input);
        }
        return result;
    }
}
At last, classes MaximumAccumator and SumAccumulator can now implement interface Combinator. We also rename them to express their roles more precisely:
class MaximumCombinator implements Combinator {
    public int combine(int element1, int element2) {
        if (element1 < element2) return element2;
        return element1;
    }
}

class SumCombinator implements Combinator {
    public int combine(int element1, int element2) {
        return element1 + element2;
    }
}
As an option, we can also provide the following factory, so that code consumers have an easy way to instantiate different kinds of accumulators:
class AccumulatorFactory() {
    Accumulator createMaximumAccumulator() {
        return new Accumulator(new MaximumCombinator());
    }

    Accumulator createSumAccumulator() {
        return new Accumulator(new SumAccumulator());
    }
}
Note how, by removing class inheritance, the code has improved in several ways:
  • Class Accumulator and implementations of Combinator are not coupled anymore. Now the combinators can be reused outside of this particular context.
  • The responsibilities are more focused. So it is easier to have class names which faithfully express their implementation's intent.
  • Unit testing will be simpler. First, since Accumulator is not abstract anymore, it can now be tested in isolation. Tests of SumAccumulator and MaximumAccumulator would have been unnecessarily complex to write and redundant since they need to exercise the common inherited code twice. Now we can easily test the combinator's code.

By allowing the father's code to call some method defined in its children, the abstract keyword couples classes even more tightly than simple inheritance does. As a first conclusion of this example and the previous one, we can say the abstract keyword represents a strong code smell. As a coding rule, it seems safer to avoid it altogether.
Naval strategy (Yi Sun-sin turtle ship)


A side note on unit testing

The previous example briefly mentioned the topic of writing unit tests. So I am taking this opportunity to quickly discuss the impact of inheritance on tests.

I am a mockist. Writing unit tests the mockist style consists in replacing all dependencies by mock objects. Each mock object behaviour is described on a test by test basis by succinct propositions. The goal is to write tests as unitarily as possible, so that:
  • only few tests fail whenever a code change which breaks past behaviour occurs,
  • after each code change, it is easy to know which test suite should be run to detect potential regressions,
  • tests run faster,
  • and most importantly, the cost of writing and maintaining tests does not escalate with the program's size (or the classes' dependence depth).
So, from the mockist standpoint, inheritance is a problem. Inheritance allows to split code among separate units, but prevents from testing these units in isolation. Tests of the descendants in a class hierarchy are necessarily redundant since they exercise the code of the ancestor's classes multiple times.

Circus by Marc Chagall

Sunday, May 10, 2015

Economy of Means: On the Elimination of Inheritance (1/6)

Introduction

Today I bring an end to a question I have been pondering for more than a decade. It started as soon as my first real-life software projects, at the time in C#:
Should this piece really be coded with inheritance? Wouldn't composition work just fine? On which grounds can the choice be made?

High level languages, such as C#, are supposed to make your life easier. They provide succinct syntax for powerful yet simple abstract concepts which are automatically translated, by the compiler or interpreter, into long lists of low level machine instructions. At least, that is the theory. In practice, high-level languages are shipped with a variety of shiny new constructs, which reflect the tastes of their inventor, or the fashion of the time. Junky programmers will just happily jump on the band-wagon, using any gadget to quickly produce what I see as no more than unmaintainable disposable paper software.
On the contrary, thorough programmers, which need to produce larger software (many functionalities, several team members, and at least a few years of maintenance), must painfully learn the correct way to use their tools. The school of thought I adhere to is minimalism. Hence, I personally believe that only a few well-chosen orthogonal constructs in any programming language are always sufficient. Having experienced various flavours (imperative, pure, strongly typed, static, dynamic, modular, object oriented, functional, event-based, declarative, synchronous, relational...), I am not afraid to reduce (mutilate according to language zealots) any programming language syntax in order to unveil its core paradigm. For every construct, I like to decide whether it is possible to do without or at least delineate the precise conditions that make it applicable. The benefits of restricting syntax to its bare minimum are greater code homogeneity, increased readability, and shorter learning curve for team newcomers or language beginners. Isn't reduced risk of astonishment a worthwhile goal in itself?
Here is a list of language options which generate dilemma (at least for me):
  • possibility to omit braces for one-line blocks,
  • initialization next to the field declaration (rather than in the constructor),
  • for loop (rather than while),
  • goto statement (rather than structured programming),
  • global variables (see A case against static),
  • static methods,
  • Long (rather than long),
  • struct (rather than classes in C#),
  • casts (rather than generics/templates),
  • HashMap (rather than Hashtable in Java),
  • objects (rather than modules in OCaml),
  • decorators as annotations (in Python),
  • new (rather than factory constructor in Javascript),
  • ...
Inheritance contributes to the existing plethora of paradoxes of choice.

If class B inherits class A, then it receives all methods and properties of A. Class A is called the base class (or father); class B is the child. From a software design perspective, a class hierarchy with only two elements does not make much value. So let us add another offspring of A named class C. Following UML conventions, this simple canonical hierarchy is represented by the following schema:

In various situations, this triptych, is often, unfortunately, chosen as the design by default. In the following, I will describe alternative and often preferable designs that avoid inheritance. Through these examples we will gradually grasp the true nature of inheritance.

Side note: Although, all code is written with the Java syntax, the same concepts are present in most mainstream object oriented programming languages.

A bit of reductionism

Deaf to YAGNI

In any code base, you sometimes (more than you would ever have wished for) encounter a degenerate class hierarchy. That is a hierarchy with only two members: the father and its son. In the best case, it is an intermediate state after a previous unfinished refactoring session. Just after a dead classes has been removed. But most often, it is a consequence of programmers not living in the present: in anticipation of a future evolution of the software, they just made it unnecessarily complex.
The most recent encounter with this pattern that I remember is this:
  • a class UserManager with most methods abstract,
  • and a single implementation RemoteUserManager.
The cure is easy: merge the two classes and get rid of all unnecessary syntax. In the process, we picked a more telling name for the resulting class by calling it RemoteAuthenticationService.

The future is not easily predicted, even more in software development. Today's preparation for a potential evolution often turns out to be a hindrance in meeting tomorrow's concrete requirements. To make a physical analogy, it is like walking further in a direction because you think that is where you will be asked to go next, but then having to walk back an opposite path when the destination changes.

Euclid said it best:
"In any triangle two sides taken together in any manner are greater than the remaining one."


Simply put, stay focused on implementing just enough code to meet the current required behaviour. It is a safer strategy, by a large margin. At least that is what Yagni says.

DRYing the wrong way

Frequently, small helper functions are found in the parent of a class hierarchy. For instance, a function that converts hours into milliseconds, could belong to a hierarchy of Tasks. Or a function which formats speeds either as kilometers or miles per hour, could be part of a hierarchy of Vehicules.
Coders did that probably because they were in a hurry. It is so easy to move up common code in a parent class. At least, didn't they resort to copy-pasting.

First, as a general rule of thumb, methods which do not access the object state (through keyword "this"), most probably do not belong to the correct class. Helper methods couple the whole hierarchy with unrelated code. Hence, helper methods are clearly not a recommended usage of inheritance and can advantageously be replaced by composition. Composition will let you write less monolithic code by distributing the responsibilities into distinct smaller and reusable class.
However, pulling up helper methods may be a useful intermediate step when refactoring unmaintainable code.

A matter of choice

Sometimes, we want to store different kinds of data in a single collection. For instance, we could have various types of shapes (Triangle, Square, Circle...). Then, we can define a common interface Shape and have all objects implement it. Interface inheritance is the OO way of providing union types.

At other times, various objects must be processed to compute results of similar nature. Building on the previous example, we could need to compute the shapes' respective surfaces. Then, we simply define method computeSurface on the Shape interface and have each shape implement it its own way.
Often, the treatment can not be directly coded inside the objects, but lies somewhere else. Consider, for instance, a compiler which outputs a syntax tree, or a database which accepts commands. In order to be able to code the treatment, some kind of dispatch mechanism is needed. The naive implementation consists in branching according to the object type. Since its recommendation by the GOF, the visitor pattern has become a popular design. However, I find a map from object types to the appropriate treatment proves to be a much more flexible and less intrusive implementation. But that's material for another post.

Interface inheritance also allows the construction of complex recursive data-structures. To give a simple example, consider arithmetic expressions which are comprised of:
  • literals, such as constants integers,
  • compound expressions such as the binary addition of two other expressions.
With these heterogeneous pieces of data unified under a common type, we could easily perform a variety of operations: evaluation into an integer, conversion into a string representation, height count, size (total number of nodes) count, translation to yet some other formalism, symbolic manipulations (such as factorisation)...
And of course, we could also store several expressions into a unique worklist.

The two opposite sides of inheritance


As a summary for this first set of simple examples, I would like to point out that inheritances mainly serves two orthogonal purposes:
  • reuse (or factor) common code or data,
  • union distinct objects under the same type.
More complex inheritance usages are always a subtle combination of these two fundamental aspects in various proportions.
As such, inheritance is a language feature which is not atomic. It can be expressed as a combination of:
  • code composition for the factoring aspect,
  • interface inheritance (or simply duck-typing in more dynamic OO objects) for the union aspect,
  • and some glue code, mainly to forward calls.
As we will see in the following, more complex examples, it proves almost always preferable to break down inheritance into its constituents. The design becomes more focused and readier for change.



Note: In order to make it more easily digestible, this topic will be split into 6 different posts.

 Next