*** MOVED ***

NOTE: I have merged the contents of this blog with my web-site. I will not be updating this blog any more.

2006-05-31

GCJ: Quo Vadis?

Andi Vajda asked whether GCJ would cease to exist if Sun were to release the source code for Java and its tools under a really Free licence. I have also seen such questions asked on Slashdot, OSNews and other fora.

The response from Andrew Haley mirrors what I personally think of the situation - as long as there are hackers willing to maintain it, GCJ would continue to exist. Miguel de Icaza says that among the two types of hackers who usually work on Free Software in their spare time, GCJ and GNU Classpath only attract the Free Software idealogues and not those who want to get something free (gratis) working for them since Sun's JDK already works well for them and is free. Tom Tromey has a more philosophic take on the current situation - almost ascetic in fact. All this is enough to make a GCJ or GNU Classpath hacker reflect on the current state of Free Java, the utility of his contribution to it and the impact of a fairly Free release of Java source code from Sun. Despite being an extremely erratic contributor working on the fringes of Free Java, I cannot help doing the same.

It was about four years ago that I first flirted with GCJ. I wondered, for no particular reason as is usually the case with me, if it was possible to use GCJ to create native GUI applications with SWT in Java on Windows. I found out that the support was almost there and with a little effort and a lot of support from the GCJ hackers (especially Tom), I was able to add in that support and contribute it back to GCJ where it was accepted with minor modifications. That was my epiphany with Free Software. Until then I was more impressed by the fact that I could get so much decent-quality software for free than the liberty to make modifications to such software. But now I began to realise that having the source code available to you meant that you could change it yourself to fix its shortcomings and share your improvements with the other users. It also meant that the availability of the software did not depend on the solvency of the vendor or its willingness to maintain it. There were many other factors in favour of Free Software that became apparent to me over time. Suffice it to say that I finally understood what that twitchy, smelly and passionate preacher of Free Software was talking about all the time.

In time, my original itch died out (like so many of my digressions in life) but I continued to work on GCJ. I quickly moved from Windows to Linux (since that was the only platform that I enjoyed working on) and began fixing front-end bugs more out of a desire to help folks than to fix anything that was affecting any of my personal or professional work. This played a part in the fact that my track record with GCJ has been absolutely abyssmal (except perhaps in the area of contributing the most noise to the GCJ mailing lists). My pathetic time-management skills and a propensity to be carried away by even the slightest distraction have also played a big role. Since I worked on GCJ in my free time at home and I did not want to compromise too much on my personal life (spending time with my family, watching movies, reading books, meeting friends, getting enough sleep, etc.), I rarely found the time to debug and test anything except the most trivial of bugs. Finally, my tendency to "wait and watch" (first for GCJX and now for ECJ) has not helped matters much either.

There are some inherent problems with GCC and GCJ too. The GCJ front-end seems to have been written in a hurry in order to get as many things working as fast as possible with not much thought given to overall maintainability. The people who originally wrote it have moved on to other things in life leaving others with little idea of how it all works. Subsequent hackers (including me) have always made incremental changes to fix immediate issues rather than perform any big refactoring of the code, with the natural result that the code has become even more unwieldy now than before. To fully bootstrap GCC, especially with checking enabled, and then to run its testsuite takes an awful amount of time, especially on slightly older hardware (like my otherwise perfectly capable P3-based system). This is a huge barrier for most prospective hackers. (I reserve my rant about the disastrous effects of bundling several language front-ends and their ever-bloating runtime libraries into a single compiler system for another day.) Until recently, the ubiquitous tree data structure was used in funky ways for almost everything in GCC. There is precious little documentation for a programme of this complexity and some parts of this documentation is out-of-date. The best way to understand stuff in GCC is to read through the source code, to watch the operation of the relevant parts in a debugger and to ask questions on the mailing lists when you do not understand something even after doing all this.

All these are problems that can be overcome one way or the other. The biggest problem with GCJ however is the sheer paucity of hackers willing to work on it to improve it compared to the number of people willing to use it and reporting problems with it. This situation is particularly severe for Windows. Were it not for Red Hat's sponsorship of some critical GCJ hackers (and the heroic efforts of Tom in particular), GCJ would have been in a very bad shape by now. This situation really makes me realise how true Miguel's observations are with respect to hackers of Free Software and Free Java.

A Free Java from Sun would not obviate the need for GCJ though. I personally feel that ahead-of-time compilation to native code providing more opportunities for aggressive optimisations (platform-agnostic as well as platform-specific) and a more straightforward integration with C/C++ via CNI are enough to show the utility of GCJ orthogonal to the status of the freedom provided by Sun's JDK.

This post has already become the longest I have ever posted, so I will reserve my rant about how Java the language and its bloated "standard" runtime is not even worth spending so much time and effort on in the first place, for another day.

Hello World

I will now be using this weblog instead of my Advogato diary.

Moving

I am now moving to rmathew.blogspot.com for blogging. I find Advogato a bit painful to use for blogging. I also do not want to be restricted to only talking about hacking on Free Software.

(Originally posted on Advogato.)

2006-05-17

Google and Maths

"Fuzzy Maths", an article on Google in the latest edition of The Economist, contains this interesting bit:

Google constantly leaves numerical puns and riddles for those who care to look in the right places. When it filed the regulatory documents for its stockmarket listing in 2004, it said that it planned to raise $2,718,281,828, which is $e billion to the nearest dollar. A year later, it filed again to sell another batch of shares -- precisely 14,159,265, which represents the first eight digits after the decimal in the number pi (3.14159265).

Their famous recruitment campaign and their very name further reinforce the impression of their obsession with Mathematics.

(Originally posted on Advogato.)

2006-05-10

Peer to Patent

The US Patents and Trademarks Office will soon try out Peer to Patent as a pilot project. This is great news. It is really important for silly patents to get rejected upfront than be granted and then used to bully everyone into either paying up an extortion fee or engaging in costly lawsuits. Unfortunately, there is still the problem of lots of such silly patents having already been granted and used for corporate "defence funds" (an equivalent of the "Mutual Assured Destruction" strategy) or towards unscrupulous ends.

The Economist has a nice set of balanced articles on patents and other IP-related topics.

(Originally posted on Advogato.)

Security: The 3 As and the 3 Rs

(I am just collecting my thoughts here; I do not require anything like this right away.)

A useful framework for security should provide:
  • Authentication - verifying that the user is indeed who he claims to be.
  • Authorisation - verifying that the user is indeed allowed to do what he wants to do.
  • Auditing - recording the attempt to do the intended action, its outcome and whether the action was indeed done.

The authentication framework should be able to able to plug into various authentication mechanisms (OS-based, LDAP-based, etc.), be flexible enough to accept various types of credentials (username/password, PKI certificate, etc.) and reliably establish the "Identity" of the user.

The authorisation framework should allow the specification of:
  • Rights - what is allowed.
  • Roles - who is allowed to do it.
  • Realms - where are they allowed to do it.

Role-based authorisation allows for the maximum flexibility compared to the direct checking of the Rights of the given Identity. An Identity could be associated with multiple Roles. Realms establish domains of privileges - for example, a person has administrator privileges on his desktop PC but is just an ordinary user on the LAN. Rights could be positively stated ("Allow Foo") or negatively stated ("Disallow Bar"). Authorisation could be inclusive (at least one Role associated with the Identity has the Right) or exclusive (no Role associated with the Identity should be denied the Right). I personally favour positively stated Rights and inclusive authorisation.

The auditing framework would be used for non-repudiation, so it should have integrity (only the auditing framework could have written out a given audit record) and an almost transactional association with the respective action (record an action if and only if it was actually done).

Of course, in real "enterprise" software we end up with various degrees of compromise on each of these aspects.

(Originally posted on Advogato.)

2006-05-05

Faster Logging

When you have an application that must log information (for auditing, debugging, etc.) but still run as fast as possible, it is rather wasteful to always dump fully-formatted human-readable trace records. It's far better to dump a short binary record indicating the message identifier, parameters for the message (if any), timestamp, process/thread identifier, etc. that can be processed later for human consumption using a separate "trace formatter" tool. This way you save on processing time and disc space but make it slightly inconvenient to view the log files.

On UNIX-like systems, utmp and wtmp records are created and processed this way. I have also seen this kind of logging in IBM's AIX operating system and its CICS transaction processing monitor. Why then do several modern "high-performance" applications still insist on using the slower and more bloated method?

(Originally posted on Advogato.)

2006-05-02

Planet GCC

There is now a Planet GCC aggregating the feeds from Planet Classpath and the blogs of a bunch of GCC hackers. If you know of a blog of a GCC hacker that is not directly or indirectly aggregated here, please let Dan know. Thanks to Dan for this initiative.

(Originally posted on Advogato.)