mercoledì 29 febbraio 2012

What does it mean Lisp has no syntax?

Not long ago a colleague told me that his professor while talking about Lisp explained the class that Lisp has no syntax... and this colleague was just reporting this to show me how bad the teacher was.

However I've investigated a bit on the Lisp language and I've to agree, in some sense, with the professor. This post is tries to explain why this apparently nonsense is indeed a true fact for Lisp.

Let's start with a very simple Lisp function, one that computes the square of a number:

    (defun square (x)
        (* x x))

Apparently this doesn't look really different from what one would write in other languages, for example

    # python
    def square(x):
        return x * x

    // C or C++
    double square(double x)
    {
        return x * x;
    }

What is then the difference in Lisp?

The difference is that in other languages the syntax is fixed and the compiler is translating from characters to executable code, with no or very limited control on this process.

In Lisp instead the translation is performed in two separate steps

1. The "reader" converts characters into Lisp "source code"
2. The compiler transforms this code into an executable function

The typed characters (what is for other language the "source code") is read by a standard library function (the Lisp reader) and transformed into a structure representing Lisp code. This reader function is a general function used to convert from characters to Lisp data.

In the square case this structure is composed by a list of four elements where the first two are Symbol objects (named "defun" and "square"), the third element is another (sub)-list containing just the Symbol named "x" and the fourth element is a three-elements list where the first is the Symbol named "*" and the other two are the same Symbol named "x" seen before.

This data structure is what the compiler accepts as input to produce the executable code.
This is the real Lisp "source code"... and while it has a "syntax" it's not a syntax in the usual meaning of characters, because the real Lisp source code is not represented by characters, but by lists and atoms.

This transformation to a tree representation is performed also in most other languages, however the difference is that while in Lisp the programmer has control over each of these two separate steps in other languages the whole operation is monolithic and almost immutable, taking characters as input and generating executable code as output (some language allows some very primitive preprocessing or metaprogramming... and that's why I used "almost").

Note also that:

1. The reader function can be customized. If you need small changes then you can just extend the reading step.

2. You can produce the real "source code" for the compiler (i.e. the data structure) using other ways... for example generating it with a Lisp function. This indeed happens very often in Lisp... and a function that generates code is named "macro". Writing code that writes code is something that is "natural" in Lisp... "I write code that writes code that writes code for which I'm being paid".

So to sum it up:

Lisp has no (fixed) syntax in terms of characters. In Common Lisp there is a predefined "default" syntax that is understood by the standard lisp reader to convert from characters to Lisp data, but this is just a predefined syntax that can be customized and the data used as source code can also be generated in other ways.

martedì 10 agosto 2010

Never-failing functions

void *memAlloc(int size)
{
    void *p = malloc(size);
    if (p == NULL)
    {
        fprintf(stderr, "Out of memory...\n");
        exit(1);
    }
    return p;
}

Sometimes it's very annoying to deal with possible failures; the code gets more complex because of the error handling logic and in my experience often the code handling special situations is bug-ridden (and that's no surprise, it's code that is doomed to be complicated and to have strange flows).

The common way to handle an "anomaly" is to signal to the caller that something wrong happened, so that the calling code will be able to handle the problem itself. However in many many cases there is little that the caller can indeed do, except failing itself...
In a sense detecting and signalling an error is just dropping the burden of a problem to someone else, it's not solving the problem. The code for the caller will become more complex, and given that (hopefully) for every single function there will be many callers the total code complexity is increased every time we play this "pass the problem" game.

Wouldn't things be simpler if a called function could never ever ever fail ? Sure they would... but apparently that's impossible.

Or may be not ?

Indeed it's very easy to write functions that cannot fail... after all a function is some code that given certain preconditions will do something and ensure certain postcondition once the function returned; this is what is normally called the "contract" of a function.
In the code above the function contract is simple: given a non-negative number it will reserve somewhere that amount of bytes of memory and will return the address of this block to the caller. It's more or less like malloc, but just cannot fail. Why I say that the function cannot fail ? because if there's not enough memory it won't return to the caller at all! So it will never return to the caller without having first done its job and the caller code is consequently a lot simpler.

Wait.

But this means that I cannot handle a not-enough-memory condition in the caller. Yes... it means that. So this function should be used when there would be nothing reasonable to do in such an event. Does this happen oftern ? Sure it does. I think vast majority of memory allocations fit in this category for example.

Why ?

Ok... suppose that an allocation for a new string (20 bytes) just failed. Now what ... do we want to inform the user ? BWHAHAHAHAHA.... not even fopen is likely to be working, c'mon... you really want to open a window and ask something ? Do you want to open a file save-as dialog ???? you want to do this using less than 20 bytes of RAM.... yeah, sure... good luck!
Also please consider that in many many cases before telling you that there are not even 20 bytes of memory the system already filled all of your swap partition. Most propbably it has been running like a sluggish drunk turtle for minutes just making hard disk and laptop fan noises... and well before you get the out of memory condition the user already gave your sluggish noisy drunk turtle program the ctrl-alt-canc (or kill -9) goodbye. And rightly so.

Chanches are that if you even dared to call a save-as function and it actually did something then probably it didn't work correctly (why do I say this ? because I saw truly horrible code trying to handle out of memory and half-doing things in those cases... note also that not everyone checks every return code, and that C++ programmers sometimes even dare doing things like catch(...) and the "swallow" the exception.

When can a memory allocation fail, while still leaving a system usable ? For example when the amount of memory is huge. If you cannot allocate say 200Mb for a picture I think it's ok to try to keep running and inform the user, so in that case just call malloc. Hopefully your program will indeed be able to open a dialog and tell the user what is the issue and so on (well... you should be able to do that using a lot less than 200Mb if you're not using KDE4, so there are serious possibilities that the program will be still functional even if a 200Mb allocation just failed).

This little example could be extended a lot of course... for example by keeping a memory reserve and when an allocation fails the code can then start using the memory reserve and raise a flag. Raising the memory low flag could allow some code (at a very high level) to try to solve the problem.
In this variation the memory allocation function should die only if allocation failed and the reserve has already been activated. By using this approach still most of the code that needs to do memory allocation remains simple and correct and this because the memory allocation function has a simple contract that do not include failure.

So the idea for the pill is the following... sometimes returning a failure return code is a just way to make your program more complex and consequently more buggy; you should signal anomaly conditions only if you think that the caller can do anything to solve the problem. Your should consider that a function that cannot fail is a lot easier to use and in many cases the failure is an absurd condition and nothing you could hope to handle anyway. For those cases it may be worth to write functions that never fail.

If  you want to see what the opposite of this approach is then take a look at DirectX code... that is such a mess of possible failures that it looks like a joke. Unfortunately it's not a joke, and in my opinion it's no life to program that way.

Code pills

What's all this about ?

This is just a place where I'll try to save a code pill every now and then. What's a code pill ? It's just a little bit of code that however is not useful per se, but for the concepts it carries...

Of course I'll try to post only correct and working pieces of code, but the main idea is to use this blog just as a memory for the thoughts that in the end resulted in the code pill.

Copying a code pill and using it without understanding what's behind it is not something I would suggest exactly like I wouldn't suggest you to open a pharma cabinet and taking pills here and there without understanding.

As a programmers we always take responsibility about the code we write (ok... not for the law, but it's not absurd to assume that soon or late this kind of denial of responsibility will end like it did for every other professions in modern societies) and using a piece of code that we don't understand at least roughly and don't trust to some serious extent is just nonsense.