Sunday, March 6, 2011

Mercurial: sharing code in the team

In a previous post, I explained how a source control system such as Mercurial can be used to prevent the accidental loss of source code. More specifically, I showed how to:
  • create a new source code repository with hg init,
  • track a file in the repository with hg add,
  • save a modification of the code base with hg commit,
  • and restore any previously commited version with hg update.
This time, I would like to explore another, even more important, function of source control systems: that of allowing a team of developpers to work at the same time on the same piece of software. Consider the simplest case of only two developpers: Laurel and Hardy are preparing the script for their next movie scene. They are pressed by time. They know some large companies, such as International Big Movies, pair scriptwriters by opposite locations on earth: one in India with one in the US or one in Japan with one in Europe. That way, while one works, the other one sleeps. However this solution is clearly above their means. So they would rather work in parallel. Both would start with the same initial text, each make his own revisions, and still be able to converge towards a final and common version. Mercurial will let them do just that, safely and almost seamlessly. Here is how.

2001: A Space Odyssey, a movie by Internation Big Movies?

Suppose three computers sit in Laurel and Hardy's office: stan, ollie and hal-roach. Laurel obviously works on stan, Hardy uses ollie and the last machine hal-roach is a server for administrative tasks. The server hal-roach is accessible to both stan and ollie through the network.
They first set up a central repository, say script, on hal-roach. At all times, this repository contains the reference version of the script. For now, it is empty. Let me recall the commands to create a repository from an empty directory:
hal-roach> mkdir script
hal-roach> cd script
hal-roach> hg init
In order to exchange data with the central repository, they must first publish it. To do so, they have several options, including publishing through:
  • http with the built-in server via command hg serve,
  • http(s) with an existing web server, such as Apache,
  • ssh,
  • a shared disk system mechanism, such as Samba.
Most of these methods are described in this page of the Mercurial wiki. Complex administration tasks are not Laurel and Hardy's cup of tea. Since, they are going to access the central repository through their trusted internal network, they choose the easiest fastest method: the built-in server. First, they configure the repository, so that anybody will be able to deposit his changes via http. Then, they edit the configuration file hgrc in subdirectory .hg to contain the lines:
[web]
allow_push = *
push_ssl = false
Then they run the server with command:
hal-roach> hg serve
Using a web-browser, they can now check the status of the repository simply by going to the address:
http://hal-roach:8000/
Next, Laurel gets a copy of the whole repository on his machine:
stan> hg clone http://hal-roach:8000/ script
Hardy does the same on his own machine:
ollie> hg clone http://hal-roach:8000/ script
Laurel begins work.
stan> cd script
He creates a file dialogue.txt with the following content:
Stan: You know, Ollie, I been thinkin'!
He then commits the new file:
stan> hg add dialogue.txt
stan> hg commit -m "[add] First line"
In order to make his contribution available to Hardy, he now pushes the state of his local repository towards the central repository:
stan> hg push
Hardy retrieves the lastest changes from the central repository by typing command:
ollie> hg pull -u
And adds his line to the dialogue:
Ollie: Waht about?
He then commits and pushes:
ollie> hg commit -m "[add] a question"
ollie> hg push
Now Laurel pulls the latest changes:
stan> hg pull -u
Adds the next two lines, commits and pushes:
Stan: Well, if we caught our own fish, then we wouldn't
have to pay for it and whoever we sold it to, it would
be clear profit.
Ollie: Tell me that again!
stan> hg commit -m "[add] my answer"
stan> hg push
Immediately after he pushes, Hardy notices his typographical error in writing 'Waht' instead of 'What'. So, while Laurel crafts his next replicas, Hardy commits a correction:
Ollie: What about?
ollie> hg commit -m "[fix] small typo"
But when he wants to push, Mercurial aborts:
ollie> hg push
abort: push creates new remote heads on branch 'default'!
(you should pull and merge or use push -f to force)
So he pulls the last changes made by Laurel. He then has a look at the history of changes on the repository:
ollie> hg log
There are four commits so far. Commits 2 and 3 both originate from commit 1. There are thus two parallel versions of the script, in Mercurial jargon two heads. The heads on a repository can be obtained by typing command:
ollie> hg heads
Mercurial refused the push to avoid replicating this situation on the central repository. It is, in fact, generally preferable to have only one head on the reference repository. So, before going any further, Hardy must reconcile both versions. To do so, he types command:
ollie> hg merge
Mercurial automatically combines both changes. The dialogue now reads:
Stan: You know, Ollie, I been thinkin'!
Ollie: What about?
Stan: Well, if we caught our own fish, then we wouldn't
have to pay for it and whoever we sold it to, it would
be clear profit.
Ollie: Tell me that again!
Hardy commits the merge and can now push:
ollie> hg commit -m "[merge]"
ollie> hg push
Since the differences between both heads were on distinct parts of the text, Mercurial merged them correctly without any manual intervention. Sometimes, as we will see next, there are conflicts and choices must be made. Laurel pulls the version Hardy just merged, ends the dialogue and pushes:
stan> hg pull -u
Stan: Well, if we caught our own fish, then we wouldn't
have to pay for it and whoever we sold it to, it would
be clear profit.
Ollie: That's a pretty smart thought!
stan> hg commit -m "[add] finished dialogue"
stan> hg pull
But Hardy has a different idea and completes the dialogue differently, and commits:
Stan: Well, if we caught our own fish, then the people
we sold it to wouldn't have to pay for it, the profit
would go to the fish...
ollie> hg commit -m "[add] antimetabole"
Before he is able to push, he has to pull and merge once more.
ollie> hg pull
ollie> hg merge
This time, Mercurial is obviously not able to automatically choose the right version of the fifth dialogue line. Hardy chooses his version, commits and pushes.

Laurel and Hardy both agree on the final version of the dialogue, which reads:
Stan: You know, Ollie, I been thinkin'!
Ollie: What about?
Stan: Well, if we caught our own fish, then we wouldn't
have to pay for it and whoever we sold it to, it would
be clear profit.
Ollie: Tell me that again!
Stan: Well, if we caught our own fish, then the people
we sold it to wouldn't have to pay for it, the profit
would go to the fish...
Ollie: That's a pretty smart thought!
Mercurial is pretty powerful, isn't it? However, like any other tool, a proper usage will let you get the most out of it. Here are a few simple practices I learned to follow.
First, I dislike long painful conflict-ridden merge sessions. So I prefer small and frequent commits. It also helps having your developers be responsible for distinct components. The standard sequence of Mercurial commands should be to:
  • pull the latest repository state before starting any task,
  • perform a (preferably small) task,
  • commit,
  • push,
  • pull, merge and push if the previous push was to create an additional head.
Small commits are also easier to describe. Commit messages come in 5 different flavours, which I indicate with a keyword in squared brackets:
  • [add] for any feature addition (usually goes with a corresponding test),
  • [fix] for any bug fix (usually goes with a corresponding test),
  • [clean] for any code refactoring,
  • [doc] for any comment: documentation, TODO, notes, remarks...
  • [spec] for any additional test,
  • [merge].
Such a taxonomy helps quickly grasp the nature of the modifications performed by your co-workers when coming back from vacations.

I would like to end this post with the description of a procedure which I find crucial, even though, unfortunately, not widely followed. Did you notice that, in the running example, Hardy introduced a spelling error and pushed it to the central repository? Had he run a spell-checker, he would have caught his error before even corrupting the central repository. Mercurial provides a mechanism to run scripts automatically before every push. It is called a pre-push hook. Hardy could have set up a hook to run a spell-checker. That way, if the spell-checker finds an error, the push operation aborts.
For software projects, I like to set up a pre-push hook that compiles the code and runs a suite of non-regression tests. To do so, I add the following lines to the .hgrc file of the central repository:
[hooks]
pretxnchangegroup.duplicate = hg push test-repository
pretxnchangegroup.check = cd test-repository && hg
update && ./run_script_that_compiles_and_performs_tests
I am using two pretxchangegroup hooks. The first hook propagates the changes brought by the push to a test repository. Then compilation and tests are run from the test repository. If any of the scripts fails, then the changes are reverted and the repository regains its previous state.
If you wonder, someone, who tries to pull from the repository before the hooks are complete, does not, fortunately, get the latest changes. Note also that the machine which executes the hooks depends on the system used to serve the repository. With Samba, they run locally, with apache they run on the server as the apache restricted user, with ssh they run on the server with your identity. Also keep in mind that the apache server may set a time limit to its connections.
Obviously tests that are run during a pre-push hook should not last too long. Otherwise pushing can become quite cumbersome and coders will be reluctant to do it often. That is why I like to keep the duration of pre-push tests below 1 minute (even if up to 5 minutes may be bearable). To achieve this while still having a good coverage, I try to keep my projects small. I structure large software developments into several small projects, each not exceeding 20 000 lines of code. If there are some tests that last long, I group them in another suite, called the post-push suite. This additional suite is executed later by the continuous integration system. (But this is a story for a future post...)
Maybe some of you think that running all tests before accepting any modification is paranoid. However, the productivity gains achieved by finding bugs as early as possible can never be stressed enough. Bugs caught by the pre-push hook are relatively easy to fix, since the code modifications are still fresh and few. There is also no time wasted by the rest of the team. In practice, a large fraction of bugs are caught by the non-regression tests run in the pre-push hooks. Only integration and performance bugs are left for the continuous integration system or testing team to catch. Pre-push hooks also prevents you from pushing incorrect code by inadvertence or overconfidence.

Do you do pre-push hooks? In particular, do you run tests at this point? If so what kind of hooks do you use?

Stan Laurel: You remember how dumb I used to be?
Oliver Hardy: Yeah?
Stan Laurel: Well, I’m better now.

6 comments:

  1. Il y aussi [FTHA] et [grmbl] comme tags utiles.

    ReplyDelete
  2. nice post! I really enjoy your style of writing and (obviously) the information about hg.

    thanks

    ReplyDelete
  3. Great tips shared here along with a number of useful and informative links. I visited PAYPAL GIFT CARD GENERATOR
    few and will check more later.I found some useful information

    ReplyDelete
  4. Thanks for providing such a nice information. this post is really helpful. It takes strengths to build for the future.. it was really nice that you decided to share this information. lic merchant portal

    ReplyDelete
  5. Thanks for providing such a nice information. this post is really helpful. It takes strengths to build for the future.. it was really nice that you decided to share this information. lic merchant portal

    ReplyDelete
  6. hanks for providing such a nice information. this post is really helpful. It takes strengths to build for the future.. it was really nice that you decided to share this information. LIC Premium Payment Process

    ReplyDelete