Wednesday, March 9, 2011

Coding rules for clean and robust C# code

Introduction
This post presents my coding standards for writing clean and consequently robust C# code.
As I stated in a previous post, more time is spent understanding, debugging and reading code than actually writing it. So it is many times worthwhile spending some effort writing clear code upfront.

These standards have thus two main objectives, namely to:
  • set a limit on code complexity, be it for its size or shape, so that all coding tasks remain always feasible,
  • establish a normalized way of writing code, so that anybody can quickly understand any part of the project.
In an ideal world, there would be only one best way of writing any piece of code; and the meaning of the code would just be plain obvious at first read. Following the rules listed in this post, is, in my opinion, a step towards this ideal.

I am not claiming this is the only valid set of coding rules, or even that it is complete. However, I have seen many ways to fail a software project and following these rules surely limits risks.
Even though, some rules may seem arbitrary, they are all motivated by, sometimes unfortunate, past experiences. So, if you do not understand the rationale for some rule, please, let me know and I will try to write further explanatory posts.
At last, I tried to order rules by importance. Very often, coding standards are so long that nobody reads, even less applies them. So if you feel this is too much, try to follow the first few rules only. And maybe, in the future, some tool will be able to check all these rules automatically.

Driving principles
First, here are a few general and driving principles:
  • Keep it simple: try to implement the simplest solution that solves the requirements.
  • Do not over-design: in other words do not code for yet inexistant but potentially future requirements.
  • Do not early-optimize: performance problems are solved after they are spot, not when they are imagined.
  • Remove known bugs before coding new functionalities.
  • Write at least one automatic non-regression test, before fixing any bug. This is really the minimum. Writing test before coding (full-fledge TDD) is best.
  • Refactor and clean up code often.
The coding priorities are:
  • robustness: the code does not unexpectedly stop,
  • correction: the code does what it is expected to do,
  • performance: the code is sufficiently efficient both in time and space.
Bojagi, or the minimality of patterns

Size limitations

The cost of all software development tasks grow, non-linearly, with the size of the code. So the most important rule is to try to limit the line count. Here are some limits I like to apply:
  • 1 class per file,
  • less than 20 000 lines of codes per solution (if not create libraries),
  • less than 1000 lines per class,
  • less than 100 lines per method,
  • less than 150 characters per lines,
  • less than 10 fields per class,
  • less than 4 arguments per method.
Please keep in mind these are approximate values that give an idea of the optimal scale for each category.

Forbidden language constructs
Here are language constructs that are absolutely forbidden:
  • Static methods, except for Main.
  • Static fields (unless they are private readonly constants).
  • Nested classes. Why hide some code that could be reused? Also, it is better to have the whole complexity of your architecture apparent.
  • Goto statements.
  • Macros, (in particular, regions (#region, #endregion), conditional code (#if (DEBUG))) are forbidden.
  • The ref method parameter.
  • The new method modifier.
  • Operator overloading.
  • Dead code.
  • Obsolete code that is commented out. After all, you should be using a powerful source control system such as mercurial (read this and that, if you don't know what I am talking about).
  • The ternary conditional expression c?e1:e2.

Strongly discouraged language constructs

Here are language constructs which should be used scarcely:
  • The null value can always almost be avoided. In particular, you should never pass the null value as an argument. It is better to create another variant of the method. Returning null can be advantageously replaced by throwing an exception. In other cases, you can use the null object pattern. When null is used to encode a special case, then document it explicitly.
  • Casts are forbidden in general. They may be necessary to implement powerful functionnalities, such as dynamic class loading. But then, they must be hidden in thoroughly tested libraries. Their use must be motivated and documented.
  • Generic classes should be rarely defined, even though their use is encouraged. Use generic classes for only one thing: when you need a collection of elements with all the same type. This practice avoids unnecessary casts. Any other need should be carefully evaluated. In particular, powerfull à la C++ tricks are banned.
  • Comments should be as few as possible. Special one line comments, starting with TODO: to signal not so important tasks or interrogations are allowed. Nothing worst than paraphrasing comments such as:
    // Creates an instance of MyClass
    this.myField = new MyClass();

Reducing and expliciting dependencies

  • Use the most restricting visibility for classes: internal then public.
  • Similarly, use the most restricting visibility for fields, properties and functions: private then internal protected then internal then protected then public.
  • Minimize using directives. Put them all at the start of the file rather than fully qualifying class names in the code.
  • Access fields through this. rather than prefixing them with m_ or _.
  • Try to abide by the law of Demeter: avoid successive fields accesses in the same statement.
  • Minimize and comment the use of delegates and events. They make the control flow harder to follow!

Structuring statements
  • Initialize fields in the constructors rather than next to their declaration.
  • Declare all objects, that implement the IDisposable interface, with the statement using. In rare occasions, it may not be possible, in which case you should document how the object will be disposed. Classes with the IDisposable interface are for instance streams and Windows forms.
  • Boolean conditions should not have any side-effects.
  • Reduce code imbrication level. In particular, prefer short conditions and early exit, so that the following code:
    if (x != null && y != null)
    {
    x.DoSomething(y.GetSomething);
    }
    is rewritten into:
    if (x == null) return;
    if (y == null) return;
    x.DoSomething(y.GetSomething);
  • Constants must be declared const whenever possible, static readonly otherwise.
  • Do not use error code, always prefer exceptions.
  • Ensure that all exceptions get caught at some point. In the worst case, set a general exception handling mechanism at the root of the program. Note this may not be that straightforward because of threads and the message loop of graphical applications.
  • Adding a logger is a good idea before shipping code.

Process rules
Bug tracker and project management
  • A bug-tracker such as Redmine should be used.
  • Tickets must be solved in the order of priority.
  • Once a ticket is solved, it should be first sent back to its author, who is the only one authorized to validate and close it.
Rules for commit messages
  • Use a version control system such as Mercurial.
  • All commit messages are prefixed by one of the following:
    [add]: adding of a new feature,
    [fix]: fixing a bug (optimizations fall under this category),
    [clean]: cleaning up code,
    [doc]: for any comment,
    [spec]: adding a new test
    [merge].
  • Whenever possible, it is also advised to indicate a ticket number in the commit message.
Organisation of a solution
  • Every Mercurial repository stores one solution.
  • Every solution has a project that produces either a library (.dll) or an executable (.exe).
  • Every solution has a project PrePush that contains pre-push tests, and a project PostPush for lengthy tests.
  • The central repository compiles and executes all pre-push tests as a hook before accepting any push.
  • A continuous integration system such as Hudson executes both pre-push and post-push tests. It also triggers the check of downstream solutions.
  • Every solution has a version number, preferably generated from the source control system number.
  • For library projects, an automatically generated documentation, with a tool such as Doxygen, is advised.
Non-regression tests
  • NUnit non-regression tests are mandatory.
  • The minimum consists in writing a test before fixing any bug. That way at least the same mistake is not made twice.
  • Better yet, apply TDD. That is, write tests before even starting to write code.

Syntactic conventions
A project should follow syntactic conventions, so that reading the code is easier for everybody. I believe that code should be light, succinct and go to the point. So I tried to choose conventions that remove any unnecessary hindrance to one's understanding.
Note that limiting complexity, compartimentalizing code and writing automatic tests are much more important than focusing on the small details of instruction syntax. So the rules stated in this section of the coding standards are really the least important. However, surprisingly, most coding standards seem to contain several (boring) pages of such rules. Note also, that most of these rules can be automatically checked by tools of minimum intelligence.

Naming conventions
  • Use camel-case, underscores "_" are fobidden except in test suites.
  • Project, namespace, class, interface, struct, method and properties names start with a capital letter.
  • Fields and variables start with a lower case.
  • Interfaces start with I.
    Correct:
    struct Data;
    int bufferSize;
    class HTMLDocument;
    String MimeType();
    Incorrect:
    struct data;
    int buffer_size;
    class HtmlDocument;
    String mimeType();
  • Avoid abbreviations (except, maybe, for loop counters and X-Y coordinates).
    Correct:
    int characterSize;
    int length;
    int index;
    Incorrect:
    int charSize;
    int len;
    int idx;
  • Getters should have the same name as the field they access, except their first letter is capital.
    Correct:
    int Count { get { return this.count; } }
    Incorrect:
    int getCount { get { return this.count; } }
  • Prefix boolean variables by is, has...
    Correct:
    bool isValid;
    Incorrect:

    bool valid;
  • Prefix event names by Event,
  • Prefix methods that handle an event with On,
  • Prefix delegate types by EventHandler,
  • Prefix methods and properties created for test purposes only with Test,
  • Prefix test methods that trigger an event with TestTrigger.
  • As much as possible, method names should contain a verb. Try using few different verbs. Avoid fancy verbs whose meaning is not clear to everybody. Try using the same verb for the same concept.
  • Prefix field accesses by this., rather than prefixing fields name with "_", or "m_".
Spacing
Care should be taken in presenting statements. A code with irregular layout looks messy. It is an indication of the poor care that was put into the code. And, according to the broken windows theory, it encourages further degradations from other coders.

Here are some of my preferred layout conventions:
  • Each instruction, even short ones, has its own line.
    Correct:
    x++;
    y++;
    Incorrect:
    x++; y++;
  • Braces have their own line. This rule applies regardless of the element that the braces enclose: namespace, class, method, property, conditionals... Is this convention better than the java one where the opening brace is on the same line as the keyword? Frankly, it doesn't matter much, as long as you are coherent. But if you really ask me, having the braces on their own lines, makes it easier to visually identify block, plus an if statement can quickly be suppressed with a single one line comment.
    Correct:
    void Execute()
    {
    DoIt();
    }
    if(condition)
    {
    DoTrue();
    }
    else
    {
    DoFalse();
    }
    Incorrect:
    void Execute() {
    DoIt();
    }
    if (condition) {
    DoTrue();
    }
    else {
    DoFalse();
    }
    if (condition)
    DoTrue();
    else
    DoFalse();
    if (condition) DoTrue(); else DoFalse();
    if (condition) DoTrue(); else {
    DoFalse();
    }
  • For very short bodies, such as those of getters, everything can be put on the same line. In this case, spaces around the braces are mandatory.
    Exemple:
    int GetInfo { get { return this.info; } }
  • For if statements, if the body statement completely holds onto one and there is no else, then the braces are omitted.
    Exemples:
    if (condition) return;
    if (condition) break;
    if (condition) x = 0;
    if (condition) continue;
  • Put a space after keywords if, while, switch and foreach.
    Correct:
    if (condition)
    {
    DoIt();
    }
    Incorrect:
    if(condition) {
    DoIt();
    }
  • Boolean expressions that do not hold on one line should have their operators aligned to the left.
    Correct:
    if (b1
    || b2
    || b3)
    {
    }
    Incorrect:
    if (b1 ||
    b2 ||
    b3)
    {
    }
  • Avoid spaces around unary operators.
    Correct:
    i++;
    Incorrect:
    i ++;
  • Put spaces around binary operators.
    Correct:
    y = (m * x) + b;
    c = a | b;

    Incorrect:
    y=m*x+b;
    c = a|b;
  • Put no space before and one space after a comma, colon or semi-colon.
    Correct:
    DoSomething(a, b);
    class A: I
    {
    }
    for (int i = 0; i < 10; i++)
    {
    }
    Incorrect:
    DoSomething(a,b);
    class A:I
    {
    }
    for (int i = 0 ; i < 10 ; i++)
    {
    }
  • Avoid spaces between the name of a function and its opening parenthesis or between a parenthesis and an argument.
    Correct:
    DoIt(a, b);
    Incorrect:
    DoIt (a, b);
    DoIt( a, b );

Questions
Here are a few things I would like to ask you:
  • Does your project have coding standards?
  • Are they known and followed by the team members?
  • Do you think it is a good idea to ask for all non-negative integer values to have the type uint? I noticed in practice, the type uint seems to be very rarely used.
  • What about software costs estimation models such as COCOMO? Can these kind of models bring any value to the development process? How?

7 comments:

  1. A few points where I beg to differ:

    1) Solutions vs repositories

    In my experience (which you shared:-), having lots of small solutions that import each other's dlls is a PITA since it disables VS's code browsing features. Adding extra global solutions including projects from various repositories is a painful operation due to technical details.

    For a fresh start, I would propose the following organisation:
    - every repository contains the same .csproj as in your system, but without a .sln
    - there is a global .sln that refers all the .csproj from the various repositories.

    In that way, we keep the separation of repositories (which minimizes the merges) and the benefits of a global .sln.

    2) Exceptions vs error codes.

    Throwing and catching exceptions is tremendously slow, making it is surprisingly easy to hit a bottleneck. You may call that premature optimization, but I have been so badly biten in the * that I now tend to avoid them whenever it is not perfectly clear that they will only be thrown in exceptional cases.

    An aside on exceptions: we shouldn't use the catch all construct, and instead catch exceptions selectively, so as to avoid hiding bugs. But the standard library is poorly documented with respect to the exceptions that may be thrown, so that I am forced to use a lot of catch alls. This is yet another reason to use exceptions sparingly as the exception channel is so polluted.

    3) Gotos
    Of course, gotos should be avoided when a higher level construct is available. But there are a few cases where they are the cleanest solution. My favorite example: two nested loops, where, in the inner loop, you wish to perform a continue/break on the outer loop. A goto will solve this problem much more robustly than a kludge with flags.

    4) Nested Classes
    Two cases where I like them:
    - when implementing a control with a list view or tree view, I usually define a class derived from ListViewItem or TreeNode. I like to keep that class definition inside the control that will use it, so that I can just name it Item/Node, instead of an ugly PloumItem where Ploum is the name of the control. More generally, I hate naming a class PloumPlam where I could call it Ploum.Plam.
    - when defining specific exceptions for a method's failure cases, I find it clearer to attach them to the class containing the method.

    5) Template Tricks
    Well, their purpose it to avoid casts and make code safer, I never understood your phobia of them. There is however a pretty good argument against them: in the early versions of the 2.0 framework (for instance that shipped with VS2005), the JIT compiler had bugs with them.

    6) Frequent refactoring, and refusal of overdesign and premature optimization.

    Theses are sound principles in the early stage of a product. However, once you start deploying it, you need to worry about compatibility if you plan to add features later (think of the PM's file format). As time passes, you grow more and more constrained in what refactoring you can perform, if you did not allow for extensions in your architecture.

    ReplyDelete
  2. Nice comments.
    My general answer would be that "he who knows his tools can afford to use them in dangerous ways".
    More in details.
    1) I like it and should try some day.
    2) As of today, I never noticed performance bottlenecks due to exceptions. Most of, if not all, the performance problems I encountered were due to the use of unnecessarily expensive algorithms (O(n^2) instead of O(n) or O(n) instead of O(log(n))...) But who knows?
    3) Maybe, it could happen. But do you really need these two nested loops?
    4) case 1. Ok, maybe. Or maybe you could have parametrization by template, or by encapsulation, instead of having it by inheritance.
    case 2. Why not.
    5) Templates are nice. But overuse of templates is a burden on code comprehension. (You should see some of the code I had to live with :)
    6) A very sound point. I guess, if you want optimal systems, then you need to break backward-compatibility once in a while. With the hope of converging...

    ReplyDelete
  3. 2) Actually, you did: updating the error list was very slow, due my using a try dic[key] catch instead of dic.TryGetValue(key).

    We found recently an instance in compact parser: an unknown keyword followed by lots of numbers -> on every line, the attempt to find a keyword fails, resulting in a throw and a catch. There were 250klines, it took forever to parse them. Using a return code instead allowed me to recover a normal speed.

    ReplyDelete
  4. To answer your question about uint, my experience is that every time I try and use unsigned types, I end up having to convert back and forth between the signed and unsigned versions. Generally, I give up and remove the us.

    It's a pity in C, where signed types have lots more undefined behaviours than their unsigned counterparts (I am always nervous when applying bitwise operations on signed types).

    However, this is not the case in C#; perhaps we should not bother?

    ReplyDelete
  5. please recommended me any Pdf version good books for master the code design principles

    ReplyDelete
  6. Tech Intellectuals is a custom software development and staffing company. C# Development

    ReplyDelete
  7. This blog is a first class work, It is indeed a classic work of art.
    software development company in delhi

    ReplyDelete