mercoledì 29 febbraio 2012

What does it mean Lisp has no syntax?

Not long ago a colleague told me that his professor while talking about Lisp explained the class that Lisp has no syntax... and this colleague was just reporting this to show me how bad the teacher was.

However I've investigated a bit on the Lisp language and I've to agree, in some sense, with the professor. This post is tries to explain why this apparently nonsense is indeed a true fact for Lisp.

Let's start with a very simple Lisp function, one that computes the square of a number:

    (defun square (x)
        (* x x))

Apparently this doesn't look really different from what one would write in other languages, for example

    # python
    def square(x):
        return x * x

    // C or C++
    double square(double x)
    {
        return x * x;
    }

What is then the difference in Lisp?

The difference is that in other languages the syntax is fixed and the compiler is translating from characters to executable code, with no or very limited control on this process.

In Lisp instead the translation is performed in two separate steps

1. The "reader" converts characters into Lisp "source code"
2. The compiler transforms this code into an executable function

The typed characters (what is for other language the "source code") is read by a standard library function (the Lisp reader) and transformed into a structure representing Lisp code. This reader function is a general function used to convert from characters to Lisp data.

In the square case this structure is composed by a list of four elements where the first two are Symbol objects (named "defun" and "square"), the third element is another (sub)-list containing just the Symbol named "x" and the fourth element is a three-elements list where the first is the Symbol named "*" and the other two are the same Symbol named "x" seen before.

This data structure is what the compiler accepts as input to produce the executable code.
This is the real Lisp "source code"... and while it has a "syntax" it's not a syntax in the usual meaning of characters, because the real Lisp source code is not represented by characters, but by lists and atoms.

This transformation to a tree representation is performed also in most other languages, however the difference is that while in Lisp the programmer has control over each of these two separate steps in other languages the whole operation is monolithic and almost immutable, taking characters as input and generating executable code as output (some language allows some very primitive preprocessing or metaprogramming... and that's why I used "almost").

Note also that:

1. The reader function can be customized. If you need small changes then you can just extend the reading step.

2. You can produce the real "source code" for the compiler (i.e. the data structure) using other ways... for example generating it with a Lisp function. This indeed happens very often in Lisp... and a function that generates code is named "macro". Writing code that writes code is something that is "natural" in Lisp... "I write code that writes code that writes code for which I'm being paid".

So to sum it up:

Lisp has no (fixed) syntax in terms of characters. In Common Lisp there is a predefined "default" syntax that is understood by the standard lisp reader to convert from characters to Lisp data, but this is just a predefined syntax that can be customized and the data used as source code can also be generated in other ways.