Whenever I hear someone boast about the huge number of lines he produced for his last project, I silently rejoice of not having to take over his code. To be honest, like most industry programmers, I was often asked to expand code I did not write in the first place. Maybe you did too. If so, you must know how often one can experience moments of profound despair, pondering at endless lines of incomprehensible code. For those of you lucky enough to ignore this feeling, let me just say that, every single time, I dug up countless skeletons, a lot more than the ones hiding in the closet of your most crooked politician. Sometimes, I believe that funambulism while plate spinning at the same time might even be easier. A single move and everything collapses. Invariants break, exceptions fly and your manager loses numerous days of life expectancy.
Working on someone else's code, harder than jultagi?
Let us now be honest and consider the reverse situation. I vividly remember when one of my coworkers confessed, slightly embarrassed, what he really thought about my code. He found it simply unintelligible. I was taken aback! Until then, I had no idea that, what I took great pride in, could be so unpleasant for someone else. You may laugh at me, but frankly, do you still easily understand code you wrote only 6 months ago?
In fact, writing a piece of code is an order of magnitude easier than reading it. In that respect, some languages fare worse than others: for instance Perl is renowned as being write-only. Not only that but also beauty is said to lie in the eye of the beholder. So objective beauty, even for software, must not exist! For a very long time, I thought so. And it was extremely frustrating. I was not able to rationally explain why some piece of code did not feel right. Neither could I show other programmers how to improve their code writing skills. My only alternative was to lead by examples or resort to arguments from authority.
Now, I believe that code beauty rests on only 3 principles. They are, if you will, the golden ratios of software. Despite their relative simplicity, they are surprisingly rarely followed by programmers, even advanced ones. However, if applied systematically, they invariably lead you to beautifully crafted pieces of code. With them, I gained a sense of direction that I hope to convey in this post.
Let us now be honest and consider the reverse situation. I vividly remember when one of my coworkers confessed, slightly embarrassed, what he really thought about my code. He found it simply unintelligible. I was taken aback! Until then, I had no idea that, what I took great pride in, could be so unpleasant for someone else. You may laugh at me, but frankly, do you still easily understand code you wrote only 6 months ago?
In fact, writing a piece of code is an order of magnitude easier than reading it. In that respect, some languages fare worse than others: for instance Perl is renowned as being write-only. Not only that but also beauty is said to lie in the eye of the beholder. So objective beauty, even for software, must not exist! For a very long time, I thought so. And it was extremely frustrating. I was not able to rationally explain why some piece of code did not feel right. Neither could I show other programmers how to improve their code writing skills. My only alternative was to lead by examples or resort to arguments from authority.
Now, I believe that code beauty rests on only 3 principles. They are, if you will, the golden ratios of software. Despite their relative simplicity, they are surprisingly rarely followed by programmers, even advanced ones. However, if applied systematically, they invariably lead you to beautifully crafted pieces of code. With them, I gained a sense of direction that I hope to convey in this post.
A golden ratio for software?
My three golden principles of code beauty are:
Local is manageable, also known as the separation of concerns principle, means you should structure your code in classes of limited size. At the same time, coupling between classes should be minimized. In C# this implies that methods access modifiers are chosen in this order: private first and then internal protected, internal, protected and finally public. Chinese junks were probably the first ships divided into watertight compartments. If the hull was damaged in one place, these subdivisions would circumscribe the flooding and help prevent the ship from sinking. In the same way, software compartmentalization limits propagation of bug impacts. Code modifications can be performed locally, small parts replaced easily. In great software, richness in behaviour is achieved by the combination of multiple small and simple components rather than by the gradual stratification of code into a monolithic and complex ensemble.
In this age of unlimited belief in the virtues of progress, you probably won't often hear the last principle: Innovation is risk. However it should be common sense that every technology comes with its own risks. In that respect software is no different. If I can implement all the required functionalities with some integer fields, classes and methods, I am the happiest man. I do not enjoy worrying about when to release files, how floating points are rounded, which method is called through a delegate or whether my threads won't deadlock... Most programmers are immediately aware of the benefits of advanced language constructs. But few, myself included, really understand how to avoid misuse. Manipulate gadgets with care and maintain a sense of mistrust towards novelty!
I find these three principles particularly relevant because of their adequacy with some characteristics of the human mind, namely:
My three golden principles of code beauty are:
- Less is more,
- Local is manageable,
- Innovation is risk.
Local is manageable, also known as the separation of concerns principle, means you should structure your code in classes of limited size. At the same time, coupling between classes should be minimized. In C# this implies that methods access modifiers are chosen in this order: private first and then internal protected, internal, protected and finally public. Chinese junks were probably the first ships divided into watertight compartments. If the hull was damaged in one place, these subdivisions would circumscribe the flooding and help prevent the ship from sinking. In the same way, software compartmentalization limits propagation of bug impacts. Code modifications can be performed locally, small parts replaced easily. In great software, richness in behaviour is achieved by the combination of multiple small and simple components rather than by the gradual stratification of code into a monolithic and complex ensemble.
In this age of unlimited belief in the virtues of progress, you probably won't often hear the last principle: Innovation is risk. However it should be common sense that every technology comes with its own risks. In that respect software is no different. If I can implement all the required functionalities with some integer fields, classes and methods, I am the happiest man. I do not enjoy worrying about when to release files, how floating points are rounded, which method is called through a delegate or whether my threads won't deadlock... Most programmers are immediately aware of the benefits of advanced language constructs. But few, myself included, really understand how to avoid misuse. Manipulate gadgets with care and maintain a sense of mistrust towards novelty!
I find these three principles particularly relevant because of their adequacy with some characteristics of the human mind, namely:
- the inability to keep track of the state of more than a few items at the same time (due to the limits of our working memory),
- the aptitude to consider a system at various scales,
- the tendency to replicate similar behavior and to hold default positions.
To sum up, between several pieces of code that perform the exact same task, I prefer the smallest, most structured and conventional one. So I try to write code that complies with these principles. It takes admittedly a bit longer than just writing something that works. But because I have learned how hard it can be to read code, I know that the gain in readability largely and quickly pays back.
As you may know, writing crystal clear code right away is almost impossible. More importandly, software has a natural tendency to grow. So much so that coding can be considered as a constant fight against entropy! To avoid code decay, regular refactoring is recommended. Refactoring techniques are code transformations which preserve the external behavior of programs. Together with non-regression tests, they are a fundamental tool to clean up code without breaking it. Almost every refactoring has its inverse transformation. So, without a clear objective in mind, it can be confusing to know which one to choose. Fortunately, when guided by the three aforementioned principles, this dilemna disappears. Let me end this post with a list of a few refactoring techniques. If you want to explore this topic in more depth, SourceMaking is a good place to start.
As you will see, most refactoring techniques are straightforward. But, as I like to say, there are no small simplifications. Small steps have a low cost and can be easily made. Yet, they often open up opportunities for bigger transformations. Also consider the fact that any major code change can always be broken down into a succession of very small steps.
At last, moving code from several callers inside the body of the method which is called, is a very effective code reduction. There are many variants of this refactoring, but the idea is always the same:
Breaking down a large class undoubtedly improves the overall architecture of your program. However, determining which fields and methods to extract can be a hard task. Try to spot fields that tend to be together in all methods.
Get rid, as much as possible, of static fields and methods. It is always possible to write a whole program with only static methods. But then, it is easy to lose the purpose of each class. Since class hierarchy is not driven by the data anymore, it is harder to feel the meaning and responsability of each class. The tendency to make larger classes increases. Data tends to be decoupled from their processing code. The number of parameters tends to grow. And static fields unnecessarily use up memory... So every time you see a static method, try to move it inside the class of one of its paramaters.
Applying these refactoring may not always be obvious. You will often need to first slightly bend the code to your will. Do not hesitate to reorder two instructions, remove a spurious test, or add some instructions. By doing this you will increase code regularity and often find corner case bugs. Besides, please note that I am not advocating to reduce code by shortening the size of variable names, or removing information which implicitely holds. On the contrary, I believe variable and method names are the best documentation and should be carefully chosen. Also, even though unnecessary, I prefer to explicitely state when fields or methods are private, when classes are internal and prefix references to class variables and methods by this.
In this post, I presented several refactoring techniques. More importantly, I presented my canons for code beauty in the form of three principles:
Here are my questions for today:
As you may know, writing crystal clear code right away is almost impossible. More importandly, software has a natural tendency to grow. So much so that coding can be considered as a constant fight against entropy! To avoid code decay, regular refactoring is recommended. Refactoring techniques are code transformations which preserve the external behavior of programs. Together with non-regression tests, they are a fundamental tool to clean up code without breaking it. Almost every refactoring has its inverse transformation. So, without a clear objective in mind, it can be confusing to know which one to choose. Fortunately, when guided by the three aforementioned principles, this dilemna disappears. Let me end this post with a list of a few refactoring techniques. If you want to explore this topic in more depth, SourceMaking is a good place to start.
As you will see, most refactoring techniques are straightforward. But, as I like to say, there are no small simplifications. Small steps have a low cost and can be easily made. Yet, they often open up opportunities for bigger transformations. Also consider the fact that any major code change can always be broken down into a succession of very small steps.
First, here are a few refactoring that reduce the level of access modifiers:
- if a public method is never called outside of its assembly, then make it internal,
- if an internal method is never called outside of its subclasses, then make it internal protected,
- if an internal method is never called outside of its class, then make it private,
- if a private method is never called, then remove it,
- make all fields private and create setters/getters whenever necessary.
- if a field is used in only one method, then make it a local variable of the method,
- if a local variable is assigned once and then immediately used, then remove it,
- if the same value is passed around as a parameter of several methods, then introduce a field to hold it. This may be a good indication that the class could be split in two parts: one to handle all computations related to this value and the remaining.
- some are initialized in the class constructors and then never modified. This is in particular the case for all fields in "functional" objects. These fields should be made readonly,
- other hold the state of the object and are constantly modified. Remove any such field when their value can be obtained by a computation from other fields in the class. Replace these fields by getters. This simplification removes the difficulty of preserving complex invariants between different pieces of data.
To make things more concrete, here is a first example:class Point { int x; int y; int distance; public void ShiftHorizontally(int deltaX) { this.x += deltaX; this.distance = this.x*this.x + this.y*this.y; } public void ShiftVertically(int deltaY) { this.y += deltaY; this.distance = this.x*this.x + this.y*this.y; } }
class Point { int x; int y; int Distance { get { return this.x*this.x + this.y*this.y; } } public void ShiftHorizontally(int deltaX) { this.x += deltaX; } public void ShiftVertically(int deltaY) { this.y += deltaY; } }
class Person { int wealth; bool IsRich { return this.wealth > 1000000000; } Person taylor; bool hasRichTaylor; public ChangeTaylor(Person taylor) { this.taylor = taylor; this.hasRichTaylor = taylor.IsRich; } }
class Person { int wealth; bool IsRich { return this.wealth > 1000000000; } Person taylor; bool HasRichTaylor { get { return this.taylor.IsRich; } } public ChangeTaylor(Person taylor) { this.taylor = taylor; } }
- Sometimes, you can decrease the number of occurences of a field by replacing several method calls by a single call to a richer method (for instance prefer one call AddRange over several calls to Add).
- Finally, I like to use automatic setters/getters over fields whenever possible, I would write:
class Tree { public int Height { private set; get; } }
class Tree { private int height; public int Height { get { return this.height; } } }
- if you access a field through a chain of fields, such as a.b.c, then define a direct getter C,
- if you evaluate several times the same expression, such as this.LineHeight*this.lines, then define a getter this.Height,
- if you have two loops with the same body but different ending conditions, then make a method with the loop boundary as parameter.
At last, moving code from several callers inside the body of the method which is called, is a very effective code reduction. There are many variants of this refactoring, but the idea is always the same:
- all calls to f are followed by the same block of instructions, then push this block down into the body of f. Sometimes, you may have to add some additional parameters to f,
- all creations of a class A is followed by a call to some method in A, then call this method directly from the constructor of A,
- method g is always called just after method f, then merge both methods and their parameters together,
- the result of a method f is always used as the argument of a method g (in other words you have g(f(x))), then insert f inside g and change the signature of g,
- a method g is always called with the same constant parameter g(c), then remove the parameter and push down the constant inside g,
- a method g always takes a new object as argument: g(new MyObject(x)). This is an indication that the signature of g is not well-chosen, and the object should rather be created inside g.
- try to avoid the null value. There are very few cases when you really need it. If you ensure your variables are always initialized, null testing can be removed altogether. In a later post, I will explain why I am generally not in favor of defensive programming,
- rather than checking a method argument is valid and perform some computation, it is nicer to filter out incorrect argument values and return early. More concretely, the following code:
int result = 0; List<int> content = this.CurrentData; if (content != null) { if (content.Count > 0) { result = content[0]; } } return result;
if (this.CurrentData == null) return 0; if (this.CurrentData.Count <= 0) return 0; return this.CurrentData[0];
- obviously when the true branch of an if then else ends with a breaking instruction, then the else branch is not necesssary:
if (condition) { // do something continue; // or break; or return; } else { // do something else }
if (condition) { // do something continue; // or break; or return; } // do something else
- when two branches of an if then else have some code in common, try to move this code out, either before or after the conditional block. Example:
if (condition) { f(a, b, e1); } else { f(a, b, e2); }
int x; if (condition) { x = e1; } else { x = e2; } f(a, b, x);
- if a condition inside a loop does not depend on the loop iteration, then try to put it before the loop,
- when you have a sequence with a lot of if else if else if else, try to use a switch statement instead,
- At last, always prefer enum over integer or string, the risk of forgetting to handle a case is lower.
Breaking down a large class undoubtedly improves the overall architecture of your program. However, determining which fields and methods to extract can be a hard task. Try to spot fields that tend to be together in all methods.
Get rid, as much as possible, of static fields and methods. It is always possible to write a whole program with only static methods. But then, it is easy to lose the purpose of each class. Since class hierarchy is not driven by the data anymore, it is harder to feel the meaning and responsability of each class. The tendency to make larger classes increases. Data tends to be decoupled from their processing code. The number of parameters tends to grow. And static fields unnecessarily use up memory... So every time you see a static method, try to move it inside the class of one of its paramaters.
Applying these refactoring may not always be obvious. You will often need to first slightly bend the code to your will. Do not hesitate to reorder two instructions, remove a spurious test, or add some instructions. By doing this you will increase code regularity and often find corner case bugs. Besides, please note that I am not advocating to reduce code by shortening the size of variable names, or removing information which implicitely holds. On the contrary, I believe variable and method names are the best documentation and should be carefully chosen. Also, even though unnecessary, I prefer to explicitely state when fields or methods are private, when classes are internal and prefix references to class variables and methods by this.
In this post, I presented several refactoring techniques. More importantly, I presented my canons for code beauty in the form of three principles:
- Less is more,
- Local is manageable,
- Innovation is risk.
Hasten slowly, and without losing heart,
Put your work twenty times upon the anvil.
Here are my questions for today:
- What principles lead your code developments?
- Do you often perform refactoring, or do you avoid the risk of breaking code that works and would you rather patch?
- I know my list of refactoring techniques is by no way exhaustive, what other fundamental refactoring do you like to apply?
- Do you know of any automatic tool that either performs simplifications, or lets you know of possible code simplifications?