Tuesday, April 27, 2010

Mercurial: a granary for software

Never since the invention of agriculture has a human activity revolved so much around the accumulation of labour than... code writing. Fortunately, in contrast to grain, software does not rot. Or does it?

Well, have you ever deleted some important files by mistake? I certainly did. And I still remember the sheer feeling of horror piercing through my heart when realizing my mistake. Inadvertent deletion is just one of the least refined ways of wasting hours of coding in the blink of an eye, here are some other:
  • you try to add yet another small feature to a complex program and suddenly nothing works anymore,
  • you factor two seemingly identical methods, and some test breaks,
  • you remove some obviously useless redrawing code and three days later some weird behaviour pops up...
In any of these cases, don't you wish to simply reverse time, go back just before it broke and forget all about it?

That is exactly what source control systems are for: they are "the granaries of software".

Granaries in Niger

Today, I will introduce my personal favorite source control system: Mercurial. Written in Python, Mercurial is available both for Windows and Unix systems. Under Windows, my advice is rather to download the graphical interface TortoiseHg. All the following examples were executed in a windows shell (cmd.exe) with TortoiseHg installed. The Mercurial commands would still be indentical in any other configuration.

First, in order to check the correct installation of Mercurial, let us type:
hg --version
This should display a version number and copyright notice. Now make an empty directory and enter it:
mkdir hg-project
cd hg-project
To initialize a new repository simply type:
hg init
This creates a subdirectory .hg in which Mercurial keeps all information about the repository. Let us add a dummy file dummy.txt with some text:
echo "Hello Mercurial!" > dummy.txt
Now, if you type:
hg status
Mercurial displays the status of the files present in the repository. Right now, there is only one file dummy.txt which is not tracked. This is indicated by a question mark next to the file name, such as this:
? dummy.txt
To track the file, first type:
hg add dummy.txt
The status (obtained by typing the command hg status) becomes:
A dummy.txt
which means dummy.txt is marked for addition in the next revision.
To create a new revision, simply type:
hg commit --user "James Hacker" --message "[add] First revision of the repository. Added a dummy file for explanatory purposes."
Note how a user name, here "James Hacker", was required. This allows Mercurial to track the author of each modification of the code base. The message entered as argument to the --message option describes the purpose of this particular revision. At this point, typing hg status will not display anything. This means all files in the current directory are tracked and synchronizaed with their tracked version.

That is where the real fun begins! Inadvertent deletion of tracked files are not to be feared anymore:
del dummy.txt
The status of the repository indicates that dummy.txt is missing. This is displayed with an exclamation mark next to the file name:
! dummy.txt
In order to recover the lost file, just type:
hg revert --all

Another scenario is when you code for a few day, but end up disappointed with the result and want to go back to your initial state. To simulate this scenario let us first replace the text in dummy.txt:
echo "By doing this, I am taking the wrong path" > dummy.txt
The status of the repository indicates, by an M in front of the file name, that dummy.txt was modified:
M dummy.txt
We commit this modification to the repository:
hg commit --user "James Hacker" --mesage "[add] Performed some experiment. This may turn out to be a mistake..."
On second thought, we are not happy with this revision. Hopefully, it is easy to recover any previous state. Mercurial assigns to each revision a distinct number starting from 0. To get an overview of all past revisions, we can type:
hg history
It should display a list of revisions, each with its number, author, date of creation and description. We want to go back to the first revision (numbered 0), so we type:
hg update --rev 0
And voila! If we check the content of dummy.txt, it is now back to its initial content:
more dummy.txt
From there, we could add a new file to the repository and perform another commit:
echo "Tracking a second file." > another-dummy.txt
hg add another-dummy.txt
hg commit --user "James Hacker" --message "[add] A second dummy file."
Mercurial warns us that another head was created in the repository. Indeed, if we check the history of the repository (with command hg history), three revisions are displayed. Indeed, even though we considered the second revision (revision 1) to be a dead-end, it will never be deleted from the repository. Mercurial keeps it in case we change our mind and want to start again from there. In fact, we can safely travel to any previous revision. The command hg parent tells us our current position in the tree of all revisions. Let us briefly play with this feature:
hg parent
hg update --rev 1
hg parent

Rome was not built in a day, and so is great software. A source control system such as Mercurial is the granary that protects your code from disasters during development time.
This post presented the basics of Mercurial. In a later post, I will talk about the Mercurial graphical user interface TortoiseHg, explain how default settings are changed and share my personal practices for revision messages.

Some questions for today:
  • Do you use a source control system?
  • If not, why?
  • If so, what is your preferred source control system?

Sunday, April 18, 2010

Foreword

Experienced software developers know there is more to coding than coding itself. Yet, academic curricula still seem to focus almost exclusively on algorithmic theory. At least mine did! Good academic curricula sometimes include some course on complexity and parsing tools such as lex/yacc.

This blog presents my way of developing software. I slowly matured this method through lots of errors and experience in the coding trenches. And I now believe it allows me to produce quality code most efficiently.
Rather than reminding you of the 37 different sorting algorithms, I will talk about rules, methodology and tools:
  • Rules direct work. They limit freedom yet show the general direction. Good programmers love discipline. For instance, they emphasize on limiting code size, and do not fear syntactic restrictions put on the programming language they work with.
  • Methodology structures daily work. Even though, it seems impossible to estimate the duration of any large coding task beforehand, following a methodology allows to track progress. It also relieves the stress of deciding what to do next.
  • As for tools, there is a French saying that goes méchant ouvrier point de bons outils", which translates as "the bad workman always blames his tools". In contrast, not only have good developers quality tools, but also the tools actually improve their work by shaping their thinking and actions.
Dear reader, I hope you find the posts in this blog concrete, relevant and practical. While writing them I am also eager to hear from you. I am interested in your opinion, the improvements you see or the way you would rather do things...
By the way, did you learn any of the following topics at school: test-driven development, non-regression testing, unit testing, source control software, refactoring, software design or coding standards?
If you are a computer science teacher, do you teach any of these topics? If not, why?

Above all, with these writings, I would like to share my passion for the craft of programming!