Jay Taylor's notes
back to listing indexOn elisp and programming in general: Lisp for C++ programmers
[web search]For whom?
One old good friend of mine, whom I respect a lot and who is a verygood C++ programmer, recently asked me to give him an example of howit's possible make new language features in Lisp. He's aware of Lisp'sability to invent new syntax, and he's also excited about C++11. So hewonders how is that possible to introduce new syntax into yourlanguage all by yourself, without having to wait for the committee toadopt the new feature.
I decided to write this article for C++ programmers, explaining coreLisp ideas. It's a suicide; I'm sure as heck that I'll fail. Greatnumber of excellent publications on Lisp for beginners exist, andstill there are people who cannot grasp what's so special about it.
Nevertheless, I decided to try. Yet another article with introductionto Lisp won't harm anybody, nor will it make Lisp even less popular.Let's be honest: nobody reads this blog, anyway. :)
Why Lisp matters?
Lisp matters because that's a language which gives a programmer thepossibility to extend its own syntax. That's as if you could inventnew keywords in C++, for example, or special syntax for std::map
literals:
std::map<std::string, int> m1 = { map: ("one", 1), ("two", 2), ("three", 3)};std::map<std::string, int> m2;m2["one"] = 1;m2["two"] = 2;m2["three"] = 3;typedef std::map<std::string, int>::value_type vtype;vtype things[] = [vtype("one", 1), vtype("two", 2), vtype("three", 3)];std::map<std::string, int> m3(&things[0], things + sizeof(things)/sizeof(things[0]));
All the m1, m2 and m3 are equal. The first example features a newsyntax for giving std::map literals; I made it up, it doesn't reallyexist. Just imagine that we could invent it ourselves.
Why Lisp matters for C++ programmers?
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. | |
Greenspun's tenth rule | |
And you're right: we were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp. Aren't you happy? | |
Guy Steele (on Java) |
Lisp has something that C++ has always wanted to have. Lisp is (tosome extent) something that C++ has always striving to become. Lispcontains some forgotten ideas which were popular in early days ofcomputing and went into oblivion as programming became more popular.Lisp will give you some useful insights that you won't find anywhereelse.
Is Lisp a silver bullet then?
Certainly not. If it were, no new programming languages would havebeen appearing after 1962.
I won't go into details about deficiencies of Lisp. That's anotherproblem to be discussed; an extremely painful problem, in fact. Justkeep in mind that I'm not claiming C++ is shit and Lisp is a rose.
What are fundamental difficulties with understanding Lisp?
If you're a C++ programmer wanting to understand how a language canbe so flexible that it enables to introduce new syntax, there are somefundamental difficulties for you to understand Lisp as an instance ofsuch language. That's because Lisp is not "C++ + the ability to definenew syntax". Lisp is much different from C++ in ways that have nothingto do with syntactic extensions. I mean, dynamic typing, functions asfirst-class values, no OOP in the first place, no templates, etc.
When a C++ programmer starts to delve into some Lisp introduction,he usually advances up to the point where Lisp looks completely alienand impractical, and then he's not able to move any forward. Syntacticextensions, as you guess, lie long after the point where the C++programmer stops.
So, what makes Lisp look alien and impractical to our C++programmer?
-
Dynamic typing. The dilemma of static/dynamic typing is one ofthose half-philosophy-half-programming questions that are not going tobe resolved in favor of either side any time soon.
I believe that Lisp core ideas are incompatible with the idea of static typechecking, at least with mandatory typechecking. I cannot explain why, without explaining those core ideas themselves. Instead, let's try to look at static type systems in the following light.
A static type system is something non-primary to programming, something additional, ancillary. Imagine an interpreter for a simple toy language: you give the interpreter a piece of code and it just executes (evaluates) the code. Static typing means that before running or compiling, there must be an additional phase of processing – type checking. That's why static typing is not necessary; there are dynamic languages without it, such as Python or JavaScript. As a downside of it, these languages have to perform type checks at runtime. For example, in Python you just write:
def process(a, b): return a + b + 20
without anywhere mentioning types of a and b. But when "+" gets executed in the course of your program run, it has to make sure a and b are actually numbers.Yeah, I know, you're saying that's bad. Having to check types dynamically over and over again instead of just checking them once is bad. I agree to some extent. But I want you to pick up 1 thing: static typing is not something vital to programming. It's not vital in the business of making computer do what you want. Static types are useful and arguably superior to dynamic types, but they are not necessary. It's an additional feature which is good and fine but not necessary.
Lots Of Idiotic Silly Parentheses.
Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in.
Larry Wall Which is more readable:
a[x] += b[x-1] + 3 * b[x+1];
or:
(aset a x (+ (aref a x) (aref b (- x 1)) (* 3 (aref b (+ x 1)))))
Those parentheses seem to be obfuscating everything. Yes, I agreethat Lisp is superficially less readable than most traditionallanguages. "Superficially" means that Lisp looks worse foruninitiated, unprepared and inexperienced people; once you get someexperience with it, you learn to see past paretheses.
But generally, that's true that Lisp looks ugly, especially Common Lisp. Although there are some nicer dialects, like Scheme or Clojure.
Why does Lisp look ugly to normal people? Ironically enough, thisugliness is the price we pay for being able to introduce new syntax. Lisp cannot have special syntax as C++ does, because in a languagewhere anyone can create new features and DSLs, no syntax is special.In fact, many people claim that Lisp "has no syntax". This really means that Lisp has uniform syntax: every piece of code is just alist delimited with parentheses.
To know what the list really is and what it means, one must look at its head element:
- if the head is the name of a function, the list is just a functioncall form, and remaining elements are arguments to the function;
- if the head is the name of a special (core) operator, then thislist is a special form with special meaning. There's a fixed number ofspecial operators (26 in Common Lisp, 5 in Scheme, about 27 in EmacsLisp). Special operators roughly correspond to keywords in C++. Theyare generally these: assignment, looping constructs (likewhile/until/for), introduction of local variables, lambda,throw/catch. Note that primitive operators like +, -, / and * areordinary functions in Lisp and are not special in any way;
- if the head is the name of a macro, the list is a special syntaxconstruct which is to be translated into a special operator form or afunction call form (one of the above 2). How this is done is the pointof this article and is described below.
Now I hope you see at least some justification why Lisp isugly.
Only lists.
Some folks after initial acquaintance with Lisp make erroneousconclusion that list is the only composite data structure available inthe language. Indeed, Lisp stands for "LISt Processing", so it's allabout lists.
No! Lisp programs manipulate all kinds and diversities of datastructures; this depends on concrete Lisp dialect you're using.Common Lisp has C-style structures, vectors and growable vectors,multidimensional arrays, strings, complex numbers and even objects andclasses.
So, why are lists special then?
Lists are special only because code is represented withlists. That's all about them. You can have lists of numbers, ofstrings, of anything you want (lists are heterogeneous because ofdynamic typing). And a piece of code is also a list! That's why listsare special. Code can be manipulated as data – that's the cornerstoneof Lisp, its heart.
When you regard a list as code, you typically call it "a form". So "form" means "list which is treated as code".
It is not compiled
Some C++ programmers (I believe not very many of them, by now) canhardly imagine how it's possible not to have a phase of compilation.If you're one of those, I'm trying now to convince you: compilation isnot necessary, much in the same way as static typing is not necessary.
What is compilation? It is translation into low-level code. There are2 ways to run a program: either run it directly with a special runner,or have it translated into machine-level language and run it directlyby the hardware. The first way is interpretation, and the "runner" isconventionally called "an interpreter"; the second way is compilation,and the program which performs translation is compiler.
The distinction between interpretation and compilation is not strict.Consider byte-codes. Your program can be compiled into byte-code, andthe byte-code is interpreted by a special byte-code interpreter.Moreover, there's also JIT which means that compilation can take place on-the-fly and is optional.
C++ is a performance-oriented language. It is unthinkable how itcan be interpreted. Or is it not? Consider debuggers: they allow youto enter arbitrary C++ expressions into the Watches window. Each timethe debugger stops, the value of expression is reevaluated anddisplayed into that window. How does the debugger do it? The debuggerinterprets the expression. Oh, excuse me: it can actually compileit, and then rerun the code each time it needs to. In this case,debugger is somehow able to compile C++ code on-the-fly, and runit.
I believe you're a little bit more convinced now that even in C++ compile/interpret distinction is not very firm.
Compilation is not necessary: it is a preparational step aiming toimprove performance of your program by translating it into lower-levellanguage. C++ is special because it targets machine language forperformance reasons, so compilation in the case of C++ is necessary. But it's not necessary in the general case.
Lisp can be both compiled and interpreted (even simultaneously);this depends on the implementation of a concrete dialect. For thepurpose of this article, I won't talk about compilation of Lisp codeexcept places where I intend to touch it explicitly. You can assumefrom now on that Lisp is interpreted, because thinking so is easier toget the idea of how it works. I promise I'll give special attention tocompilation later.
Heart of Lisp: code is data
Enough about difficulties with Lisp. Let's get to Lisp itself.
In C++, your code is completely gone once it's compiled. Code isjust a piece of text which is consumed by the preprocessor andcompiler and is transformed (translated) into some machine-dependentrepresentation and then executed.
In Lisp, code is live. Code is represented with certain datastructures that you can manipulate by means of other code. Code is notgone when you compile your program. This opens up the possibility toimplement new language features and do metaprogramming. You can writecode that writes code, because "writing code" in Lisp is no differentthan "creating lists", and "running code" is the same as "evaluatinglists".
By the way, in Lisp terminology, "evaluating" code means just"running" or "interpreting" code. Evaluation is orthogonal tothe compilation/interpretation dualism. Evaluation has to dowith carrying out computations; the compilation vs. interpretation thing is about how to carry out computations: directly or viapreliminary translation.
Lisp is thus self-hosting: your Lisp program can create new Lispprograms or modify itself. Lisp has no compile-time/run-timedistinction, and that's what the power of Lisp is all about. Even if aLisp implementation provides compiler, there's still nocompile-time/run-time dualism: you can run arbitrary Lisp code duringcompilation, and you can call compiler during the run-time of yourprogram. That means you can bake functions out at run-time and compilethem (with all optimisations), and then call them.
Evaluation of function call forms
So, now that you're supposed to be intrigued, let's talk about how code is actually represented as data structures, and how to dometaprogramming. Be patient: the trick is extremely simple, butunfortunately it cannot be explained as simply.
We shall begin with the notion of evaluation. As you know, in Lispcode is data, and "to run code" means "to evaluate data". Any kind ofdatum can be evaluated. But lists, as you guess, are of specialinterest.
Consider this piece of code:
(+ 10 20)
That's a list consisting of three items. The first one is thesymbol "+", the second is number 10, and the third is number 20. (Youdon't know yet what "symbol" means, I'll get to it later. For now,symbol is a data type by which variables and identifiers arerepresented in programs.)
Let's enter (+ 10 20)
at the REPL (Read-Eval-Print Loop). I'll use Emacs Lisp:
ELISP> (+ 10 20)30
We see that (+ 10 20)
has evaluated to 30. How wasthat done? Look at the bulleted list above, where I showed possiblekinds of forms. We have a list; its head element – symbol "+" – is thename of a (primitive) function that adds numbers. So this list is afunction call form. The other 2 items are arguments: they are passedto the function, which adds them and returns 30. Pretty simple,yes?
ELISP> (+ 3 (* 2 4))11
This is also a function call form. But the third item of the listis itself a list (a nested list). This means that it must be evaluatedbefore passing to the "+" function. So, the evaluation is carried outthis way:
(+ 3 (* 2 4)) → (+ 3 8) → 11
ELISP> (+ (- 2 3) (* 2 4))7
That was also quite intuitive:
(+ (- 2 3) (* 2 4)) → (+ -1 (* 2 4)) → (+ -1 8) → 7
ELISP> (concat "I love " "programming" (concat " despite of " "my" " age"))"I love programming despite of my age"(concat "I love " "programming" (concat " despite of " "my" " age")) →(concat "I love " "programming" " despite of my age") →"I love programming despite of my age"
This is how function call forms are evaluated in Lisp: first thearguments are evaluated, in first-to-last order, then the functioncall is made. Arguments can be arbitrary expressions.
ELISP> 2020ELISP> "Luck""Luck"
Simple numbers and strings are evaluated to themselves. In other words, they are self-evaluating.
Data types
A Lisp system operates with objects (not in the C++ sense); anydatum is an object. Object have types stuck to them at run time (ducktyping). Main object types are these:
- a number (integer or floating-point, for instance)
- a string
- a cons cell (a pair)
- a symbol
As it was mentioned above, all Lisp dialects allow for much widerrange of types. But these 4 are exceptional, because they'reindispensable and a key to the idea of Lisp. That's why we focus onthese 4.
Numbers and strings are ordinary types, just like numbers andstrings in any programming language. Strings are generally mutable inLisp, as in C++.
A cons cell (or a pair) is just an ordered pair of Lisp objects.First object and second object, which are historically called "CAR"and "CDR". Cells are implemented as a pair of pointers. Very simple:just 2 pointers.
Cons cells are important because that's what lists are made of. InLisp, lists are singly-linked. A list is just a cons cell whose CARcontains a pointer to arbitrary Lisp object, and whose CDR containseither pointer to the next cons cell or NIL, which is a special thing(symbol) to denote empty lists: