*** MOVED ***

NOTE: I have merged the contents of this blog with my web-site. I will not be updating this blog any more.

2007-07-25

ICFPC 2007

I spent most of this weekend taking part in the ICFP contest for 2007. As in 2005, I took part in the contest with Yumpee as my team-mate. Our team was called "Kalianpur Bakaits" (an IIT-K reference) in the beginning, but we changed it to "The Great Indian Rope Trick" later, for reasons that should become obvious in a while.

This year's task was very similar to that from last year, but it was a bit tougher and even more fun. The organisers surely must have spent a lot of time and effort in putting together the task description and the supporting materials. There were no mistakes discovered in the task description for this year's contest (unlike those for the previous years' contests). The mailing list also had an unusually low-traffic during the contest - perhaps it was because the task description was precise enough, perhaps it was because a lot of people were finding it difficult just to get started or perhaps the contest was affected by the release of the latest Harry Potter book during the same weekend.

This year's task was to save an alien named "Endo". Endo was a "Fuun" travelling in a spaceship named "Arrow", when it had a mishap with an "interstellar garbage collector" that picked it and then dumped it on Earth. This caused critical damage to both Endo and Arrow. Our job was to repair Endo's DNA in such a way that Arrow is able to restore it to its former condition using its limited remaining power.

Endo's DNA is a long sequence comprising four bases, 'I', 'C', 'F' and 'P'. This sequence can be interpreted to create an RNA sequence comprising the same four bases that in turn can be used to build the proteins necessary to restore Endo. Prefixes comprising the four bases can be attached to a DNA sequence to modify its behaviour - we had to find a cheap and low-risk prefix that Arrow could process to repair Endo's DNA.

This translates to creating an interpreter for the DNA sequence that produces an RNA sequence, creating another interpreter for this RNA sequence that creates an image out of it and finally discovering a prefix that could restore Endo. The organisers provided us the DNA sequence for Endo, a "source" image resulting from interpreting this DNA sequence and a "target" image representing a fully-restored Endo. We were asked to create a DNA prefix that results in an image that is as close to the target image as possible, while being as short and as cheap to process as possible.

The task description was detailed enough for us to readily build a DNA to RNA converter as well as an RNA to image converter. The challenge was to do the processing quickly enough since the given DNA sequence required about 2 million iterations over a string that was about 7.5MB long. I created the DNA-to-RNA converter, while Yumpee built the RNA-to-image converter.

Since we had to splice and join huge lists while interpreting the DNA, I started off with a C implementation that used Lisp-style "cons" cells. While it seemed to perform well (but not good enough for the task), it had hard to track bugs that caused segmentation faults, not to mention memory leaks. I then tried using Java for creating the interpreter, which resulted in a correct but even slower interpreter. By this time, I had wasted two out of the three days allotted for the task. We had not made much progress while a lot of teams seemed to have discovered the same "magical" prefix comprising 28 bases that had boosted their rankings.

This is when I finally decided to act on Yumpee's suggestion of looking into using "ropes" as the underlying data structure. He had been suggesting it repeatedly since the time he had seen the task description, but I kept putting it off since I was not familiar with it. I decided to check out the "cords" implementation of ropes that comes with Boehm-GC. I modified my original interpreter in C to use cords instead of my home-grown and buggy cons-cells implementation.

The difference in performance was startling! The very first implementation ran through all the 2 million iterations in about 10 minutes on my PC and in about 4 minutes on Yumpee's PC. The code had also become much simpler to read and much closer to the pseudo-code provided in the task description. I debugged it a little to remove a bug that resulted from a misreading of the specification and then handed it over to Yumpee. By this time, he had a working RNA-to-image converter (he was also busy barking up the wrong trees till that time) and had also discovered the magic 28-base prefix. Unfortunately for us, we had just nine hours left to finish the task.

We discovered that the fun had just started. Embedded in the DNA were a series of clues that formed a virtual treasure-hunt, involving encrypted pages, EBCDIC-encoded text, fractals, callable functions, etc. We discovered a lot of clues but we couldn't find a prefix in time that would improve our rankings. It felt good that we could make so much progress by the end of the contest, unlike the last time. However, I felt like an idiot for not having listened to Yumpee earlier and saving myself and our team a lot of wasted time.

As I always end up saying after such contests, I hope we have a better luck the next time.

Some more reports: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.

Update (2007-07-26): I tweaked the code for the DNA-to-RNA converter a little bit to avoid creating a lot of unnecessary cords and that reduced its running time to about 1 minute 35 seconds on my PC (about 7 times better than before) and about 30 seconds on Yumpee's PC. Right now the garbage collector itself is the biggest consumer of CPU cycles (about 45% on the whole), but disabling it leads to the programme quickly running out of memory on my PC.

2007-07-11

"Finance, Investments 'n' Trading"

"Finance, Investments 'n' Trading" is a weblog by Shobhit that tries to bring some sanity to the general mania currently surrounding the Indian stock markets.

You should read his articles in chronological order (he provides a helpful "Table of Contents" article for this purpose). He might sound too pessimistic to some and I do not necessarily agree with everything that he says, but the articles are still a good and recommended read, particularly if you actively buy and sell stocks or are thinking of getting into it.

The Indian stock markets have been in a prolonged and an almost-continuous bull run for about three to four years. So many people have apparently made so much money with such ease that everyone from students and housewives to army men and retired government workers want to jump into the bandwagon for fear of being left out. It does not help that newspapers, magazines and TV channels provide a disproportionate coverage of the "excitement" surrounding the Indian stock markets as if it is a form of sport.

When caution and rational reason are abandoned in times of general euphoria, many people are likely to get their hands burnt (some times without even realising that they have made a net loss, taking inflation, taxes and trading charges into account). The only people who consistently make money in such situations are the stockbrokers (for example, read Warren Buffett's "Chairman's Letter" to Berkshire Hathaway Shareholders for 2005 (PDF) from page 17 onwards).

I don't think that investing in stocks is necessarily a bad thing or that the current bull run is not backed by a real growth in the Indian economy. I just wish that people think a bit more rationally, keep realistic expectations of returns and figure out how to calculate their real loss or gain from trading, before jumping in.

I wish that more people read and understand the advice given in Benjamin Graham's superb book "The Intelligent Investor".

2007-07-04

Expiry Dates

If medicines and food items become dangerous to consume after their expiry dates, shouldn't pesticides become more powerful after their expiry dates?

2007-07-02

Dragon Ball Z

Many years ago, a cartoon TV channel in India started showing Dragon Ball Z. They showed 53 episodes of this series comprising the Vegeta Saga and the Namek Saga. Just as the series got really interesting, they yanked it off the air without notice and without even a word of apology.

Loyal viewers of the series were aghast. They pleaded with the channel to show them the rest of the series as well, rather than leaving them on tenterhooks. The channel could not care less. The viewers were frustrated and cursed the channel using the choicest of expletives. They did not have much choice.

Several years later, the same cartoon channel starts showing Dragon Ball Z once again. Some of the old viewers hope the channel would show the entire series this time and watch the episodes once again to revive their memories.

They were naïve. The channel showed the same 53 episodes of the series and then abandoned it yet again at the same critical point in the story.

They were idiots. The channel as well as the viewers.

2007-06-23

Data Visualisation with Gnuplot

Visualisation of data using charts and other types of plots is immensely helpful in getting a feel for it without carrying out detailed analyses. Gnuplot is a freely-available tool for data visualisation that is also very simple to use. The article "Visualize your data with gnuplot" is a nice introduction to this tool. Gnuplot proved to be quite handy for me recently.

I wanted to find out whether the Unit Price of a particular fund varies in line with the popular equity market indices in India, the NSE S&P CNX Nifty and the BSE Sensex. The current values of these indices are always readily available in the newspapers and on television channels, while I have to use the web-site of the fund to get its current Unit Price. If the Unit Price of the fund varied in line with the values of the equity market indices, it would save me some effort in determining its current worth.

The portfolio of the fund in question is almost entirely based on equities. It holds the shares of some of the biggest and the most stable companies across a variety of industry sectors. It was therefore reasonable to suspect that its Unit Price would vary in line with the values of the indices. However, it is not as diversified as the indices and it might not have invested across sectors in the same proportion as that represented by either of the indices.

It was easy to obtain the historical closing prices of the two indices and the Unit Prices of the fund. To keep things simple, I only considered the current month for making this comparison. To simplify things further and to improve the visualisation, I normalised the first value in each series to "100" by scaling all the values appropriately. (This is a technique that I have often seen put to good use in The Economist.)

Using Gnuplot, I obtained the following chart:



This gave me the desired answer right away!

In case you're curious, here are the Gnuplot commands I used for creating the chart above:

# We want PNG output.
set terminal png
set output "plot.png"

# Specify how and where the key (legend) for the chart should
# appear.
set key bottom right
set key width 2 box

# Tweak the look of the chart.
set title "Fluctuations in Unit Prices Relative to Market Indices"
set xlabel "June 2007"
set ylabel "Normalised Value"
set grid

# The data on the X axis represent time values.
set format x "%d"
set xtics "01-Jun-2007", 3600*24
set xdata time
set timefmt "%d-%b-%y"
set xrange ["01-Jun-2007":"22-Jun-2007"]
set yrange [95:100]

# Plot the chart using data files normalising the values in
# each case.
plot \
'nifty.dat' using 1:($5)/42.9705 \
title 'NIFTY' with lines linewidth 2, \
'sensex.dat' using 1:($7)/145.7075 \
title 'SENSEX' with lines linewidth 2, \
'fund.dat' using 1:($2)*100.00/57.0337 \
title 'FUND' with lines linewidth 2


By the way "Junk Charts" seems to be a blog devoted to criticising charts that appear in various magazines and web-sites in general and in The Economist in particular.

2007-06-12

iTunes and QuickTime

iTunes and QuickTime would probably be the applications by Apple that most non-Mac-OS-X users get exposed to and it is a shame that these applications are so frustratingly unintuitive, not to mention bloated and slow. Are these really made by the same company that is famous for putting a lot of emphasis on the usability of its hardware and software products?

2007-05-26

Indic Scripts and Linux

If you have the fonts for Indic scripts (for example, the Lohit fonts), Firefox on Linux is able to display the Devanagari text on sites like BBC Hindi and Google News in Hindi. (Devanagari is the primary writing system for languages like Hindi, Sanskrit, etc.) However, if you are using the builds released by mozilla.com, you would notice that the matras (diacritics) are not applied properly to form the correct ligatures. For example, the word "हिन्दी" ("Hindi") itself is not rendered properly. Konqueror does not suffer from such problems.

It turns out that Firefox does not support complex text layout because it doesn't use Pango in the officially-released builds (Firefox 3 will support it by default). You have to either compile it yourself from the source and enable the support for Pango by using --enable-pango, or use a build that has Pango enabled - for example, the builds provided by the Fedora Project. (Setting the environment variable MOZ_ENABLE_PANGO to "1" had no effect for me with Firefox 2.0.0.3.)

On Fedora Core 6 (FC6), it is very simple to get this working:
  1. Install the fonts for the Indic scripts you are interested in. For example, "sudo yum install fonts-hindi" , "sudo yum install fonts-malayalam", "sudo yum install fonts-kannada", etc.
  2. Install a Firefox build for Fedora using "sudo yum install firefox". Note that FC6 installs Firefox 1.5 by default - if you prefer Firefox 2.0 instead, you can install it using "sudo yum --enablerepo=development install firefox".


By the way, I recently came across Omniglot, a site about the writing systems of almost all known human languages, existing or extinct, naturally-evolved or artificially-created. I found it extremely fascinating and insightful. For example, I did not know that Devanagari was not considered to be an "alphabet" but an "abiguda". Check out the International Phonetic Alphabet (IPA) that can represent almost all spoken languages. How about Loglan (and its freer derivative, Lojban) that claims to be a "logical" language? (I first came across the IPA on Wikipedia, where it is used to provide the pronunciation for some terms. xkcd is where I first read about Lojban.)

2007-05-23

ICFP Contest 2007

The 10th ICFP Contest is scheduled for the weekend of 20th July 2007. I hope it turns out to be as fun as the one held last year.

There is already a blog written by one of the organisers that contains some teaser puzzles. (Do these images use some form of steganography or can we simply work out the graphical transformations applied to the original image and apply them in reverse to obtain the desired image? I wonder.)

"Superstition Reigns"

"Superstition Reigns" by Rahul Singh, published in The Times of India today:
"Amitabh Bachchan, an icon for tens of millions of Indians, makes his daughter-in-law [Aishwarya Rai] perform outlandish ceremonies because she is supposedly under the evil influence of Mars. Politicians routinely consult astrologers before taking important decisions, despite abundant proof that astrology is no science at all, just quackery. Horoscopes continue to be cast in most families and palmists consulted. A newly-inducted cabinet minister insists that her bungalow be completely redesigned because it does not follow vaastu principles, a system nobody had heard of till only a few years ago."

Superstition in all its ugly forms is sickeningly pervasive in India, even among educated people who ought to know better. We waste a lot of time and money and unnecessarily make life difficult for ourselves as well as others, all in the name of something that doesn't withstand rational scrutiny.

2007-05-19

Firefox 3 and Linux

Mike Connor blogs about the proposed requirements for Firefox 3 to run on Linux. A nasty surprise for me was the inclusion of GNOME as a dependency. While the GTK/Pango/Cairo/etc. requirements are quite understandable, I don't understand why it should need GNOME. Many of us are happy with KDE or Xfce and would like to avoid the bloat and the dependency hell of GNOME for the sake of running a browser.

As an aside, Firefox on Linux also seems to behave quite differently from Firefox on Windows. For example, on Linux Firefox seems to consistently consume more CPU time and memory than on Windows. Some pages are rendered differently on Windows and Linux (perhaps due to the availability, or otherwise, of the fonts requested by the page designer and the rendering infrastructure). I have personally also noticed bug-337093 on Windows but not on Linux.