Sunday, May 10, 2015

Economy of Means: On the Elimination of Inheritance (1/6)


Today I bring an end to a question I have been pondering for more than a decade. It started as soon as my first real-life software projects, at the time in C#:
Should this piece really be coded with inheritance? Wouldn't composition work just fine? On which grounds can the choice be made?

High level languages, such as C#, are supposed to make your life easier. They provide succinct syntax for powerful yet simple abstract concepts which are automatically translated, by the compiler or interpreter, into long lists of low level machine instructions. At least, that is the theory. In practice, high-level languages are shipped with a variety of shiny new constructs, which reflect the tastes of their inventor, or the fashion of the time. Junky programmers will just happily jump on the band-wagon, using any gadget to quickly produce what I see as no more than unmaintainable disposable paper software.
On the contrary, thorough programmers, which need to produce larger software (many functionalities, several team members, and at least a few years of maintenance), must painfully learn the correct way to use their tools. The school of thought I adhere to is minimalism. Hence, I personally believe that only a few well-chosen orthogonal constructs in any programming language are always sufficient. Having experienced various flavours (imperative, pure, strongly typed, static, dynamic, modular, object oriented, functional, event-based, declarative, synchronous, relational...), I am not afraid to reduce (mutilate according to language zealots) any programming language syntax in order to unveil its core paradigm. For every construct, I like to decide whether it is possible to do without or at least delineate the precise conditions that make it applicable. The benefits of restricting syntax to its bare minimum are greater code homogeneity, increased readability, and shorter learning curve for team newcomers or language beginners. Isn't reduced risk of astonishment a worthwhile goal in itself?
Here is a list of language options which generate dilemma (at least for me):
  • possibility to omit braces for one-line blocks,
  • initialization next to the field declaration (rather than in the constructor),
  • for loop (rather than while),
  • goto statement (rather than structured programming),
  • global variables (see A case against static),
  • static methods,
  • Long (rather than long),
  • struct (rather than classes in C#),
  • casts (rather than generics/templates),
  • HashMap (rather than Hashtable in Java),
  • objects (rather than modules in OCaml),
  • decorators as annotations (in Python),
  • new (rather than factory constructor in Javascript),
  • ...
Inheritance contributes to the existing plethora of paradoxes of choice.

If class B inherits class A, then it receives all methods and properties of A. Class A is called the base class (or father); class B is the child. From a software design perspective, a class hierarchy with only two elements does not make much value. So let us add another offspring of A named class C. Following UML conventions, this simple canonical hierarchy is represented by the following schema:

In various situations, this triptych, is often, unfortunately, chosen as the design by default. In the following, I will describe alternative and often preferable designs that avoid inheritance. Through these examples we will gradually grasp the true nature of inheritance.

Side note: Although, all code is written with the Java syntax, the same concepts are present in most mainstream object oriented programming languages.

A bit of reductionism

Deaf to YAGNI

In any code base, you sometimes (more than you would ever have wished for) encounter a degenerate class hierarchy. That is a hierarchy with only two members: the father and its son. In the best case, it is an intermediate state after a previous unfinished refactoring session. Just after a dead classes has been removed. But most often, it is a consequence of programmers not living in the present: in anticipation of a future evolution of the software, they just made it unnecessarily complex.
The most recent encounter with this pattern that I remember is this:
  • a class UserManager with most methods abstract,
  • and a single implementation RemoteUserManager.
The cure is easy: merge the two classes and get rid of all unnecessary syntax. In the process, we picked a more telling name for the resulting class by calling it RemoteAuthenticationService.

The future is not easily predicted, even more in software development. Today's preparation for a potential evolution often turns out to be a hindrance in meeting tomorrow's concrete requirements. To make a physical analogy, it is like walking further in a direction because you think that is where you will be asked to go next, but then having to walk back an opposite path when the destination changes.

Euclid said it best:
"In any triangle two sides taken together in any manner are greater than the remaining one."

Simply put, stay focused on implementing just enough code to meet the current required behaviour. It is a safer strategy, by a large margin. At least that is what Yagni says.

DRYing the wrong way

Frequently, small helper functions are found in the parent of a class hierarchy. For instance, a function that converts hours into milliseconds, could belong to a hierarchy of Tasks. Or a function which formats speeds either as kilometers or miles per hour, could be part of a hierarchy of Vehicules.
Coders did that probably because they were in a hurry. It is so easy to move up common code in a parent class. At least, didn't they resort to copy-pasting.

First, as a general rule of thumb, methods which do not access the object state (through keyword "this"), most probably do not belong to the correct class. Helper methods couple the whole hierarchy with unrelated code. Hence, helper methods are clearly not a recommended usage of inheritance and can advantageously be replaced by composition. Composition will let you write less monolithic code by distributing the responsibilities into distinct smaller and reusable class.
However, pulling up helper methods may be a useful intermediate step when refactoring unmaintainable code.

A matter of choice

Sometimes, we want to store different kinds of data in a single collection. For instance, we could have various types of shapes (Triangle, Square, Circle...). Then, we can define a common interface Shape and have all objects implement it. Interface inheritance is the OO way of providing union types.

At other times, various objects must be processed to compute results of similar nature. Building on the previous example, we could need to compute the shapes' respective surfaces. Then, we simply define method computeSurface on the Shape interface and have each shape implement it its own way.
Often, the treatment can not be directly coded inside the objects, but lies somewhere else. Consider, for instance, a compiler which outputs a syntax tree, or a database which accepts commands. In order to be able to code the treatment, some kind of dispatch mechanism is needed. The naive implementation consists in branching according to the object type. Since its recommendation by the GOF, the visitor pattern has become a popular design. However, I find a map from object types to the appropriate treatment proves to be a much more flexible and less intrusive implementation. But that's material for another post.

Interface inheritance also allows the construction of complex recursive data-structures. To give a simple example, consider arithmetic expressions which are comprised of:
  • literals, such as constants integers,
  • compound expressions such as the binary addition of two other expressions.
With these heterogeneous pieces of data unified under a common type, we could easily perform a variety of operations: evaluation into an integer, conversion into a string representation, height count, size (total number of nodes) count, translation to yet some other formalism, symbolic manipulations (such as factorisation)...
And of course, we could also store several expressions into a unique worklist.

The two opposite sides of inheritance

As a summary for this first set of simple examples, I would like to point out that inheritances mainly serves two orthogonal purposes:
  • reuse (or factor) common code or data,
  • union distinct objects under the same type.
More complex inheritance usages are always a subtle combination of these two fundamental aspects in various proportions.
As such, inheritance is a language feature which is not atomic. It can be expressed as a combination of:
  • code composition for the factoring aspect,
  • interface inheritance (or simply duck-typing in more dynamic OO objects) for the union aspect,
  • and some glue code, mainly to forward calls.
As we will see in the following, more complex examples, it proves almost always preferable to break down inheritance into its constituents. The design becomes more focused and readier for change.

Note: In order to make it more easily digestible, this topic will be split into 6 different posts.


1 comment:

  1. Outsourcing to Poland is getting more and more popular. But what should you begin with? Here's what you need to know about building an offshore development team in Poland offshore development team poland