back to listing index

Let's Build a Simple Interpreter, Part 1 (2015) | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: diy compilers interpreter programming-language-design
Clipped on: 2016-09-13

Image (Asset 1/2) alt= Hacker News new | threads | comments | show | ask | jobs | submit jaytaylor (2304) | logout
Let's Build a Simple Interpreter, Part 1 (2015) (ruslanspivak.com)
142 points by rspivak 1 day ago | unvote | flag | hide | past | web | 25 comments | favorite




Image (Asset 2/2) alt=

From the article: "If you don't know how compilers work, then you don't know how computers work."

I still don't understand how JIT compilers work, which is probably the majority of compilers that I'm using today. It seems that there aren't many resources or established textbooks for this area. I feel we're still left in the dark in this age.


A basic JIT is no different from a normal compiler. Just write output code to memory rather than to a file, and only compile each function as needed. It gets more complicated if you want to compete with the performance of V8 or something of course but the basics are the same.

It's a bit more complex than that. First a bit of clarification on terminology: it seems like everything that's compiling code at runtime is called JIT, but I don't like that definition. For a language implementation to qualify as JIT, it has to have an interpreter, a compiler and passing control between compiled and interpreted code.

This is where the complexity is, managing when to JIT compile a function and how to pass control back to the interpreted code. There are so many ways to do this, from compiling individual functions to doing a tracing JIT like luajit is.

So, yes, the compiler part of a JIT is "just" a compiler that outputs code to memory but the complexity is doing it just-in-time. Start executing code in an interpreter, find a hot path, then compile that and make the compiler emit instructions that return the control back to the interpreter when executing the hot path is done.


> For a language implementation to qualify as JIT, it has to have an interpreter, a compiler and passing control between compiled and interpreted code.

So you wouldn't consider the .NET framework a JIT? It never interprets any code, just compiles things as they are used.


First of all: it's a bit of an ill-defined term and someone else might have a different definition that contradicts mine.

I'm not too familiar with .NET but based on your description, no I would not qualify that as a JIT according to the definition I gave above.

If you could as well do it ahead of time, I would not call it a JIT.

Is the reason for .NET's late compilation to generate code for the target CPU architecture? Or does it do some optimization based on runtime information (this would qualify as JIT)?

It doesn't necessarily have to be a (bytecode) interpreter to call it a JIT, but it does have to compile code at runtime and have the control return back to the framework for compiling more code as needed.


I don't believe it does any runtime optimization, but it does compile code on-demand. Compiled code returns control to the compiler, but not interpreted code since there is none. IIRC this is done in a similar manner to garbage collection pauses (via interception of bad memory accesses).

If the control returns from the compiled code, I'd call it a JIT.

Do you know why it's doing incremental compilation? Is there something that can't be done ahead of time? E.g. inlining dynamic ("virtual function") calls and then applying optimization?


Mostly to avoid compiling much code that doesn't actually get executed. For the most part you can do everything ahead of time via static analysis (Xamarin's iOS toolchain does this). The only exception is access of code via reflection, which is harder to statically analyze and will also invoke the JIT.

Or V8? V8 doesn't have an interpreter (it is now getting one), but is perhaps the major example of a JIT most people would know.

Saying you need an interpreter is clearly a silly definition of a JIT.


V8 is definitely a JIT. It might not have an interpreter in the classical meaning of the word, but it does incrementally compile more as type information becomes available.

It compiles some code, executes it and control returns back to either interpreted code or the framework that decides what to compile next. It's a JIT, because it can't be done ahead of time.

Just doing the final stage of code generation at runtime/startup (like, say, PNaCL does) would not qualify. You could as well generate code for all target architectures ahead of time, but it might be more practical to leave that to the client.

Feel free to disagree with my definition, but I like to draw a distinction between just compiling code at runtime and a JIT, that compiles/optimizes code as it becomes possible.


> I like to draw a distinction between just compiling code at runtime and [something else] that compiles/optimizes code as it becomes possible

I agree with that, but I call the first thing a JIT, and the second thing a dynamic compiler. There's a PhD someone wrote about the difference but I can't find it now.


I guess we can agree that the term is somewhat ill defined but the distinction exists :)

Wait a second, I thought .NET is interpreted? I mean take C#. Isn't it compiled to IL and then run on CLR VM? And has JIT to speed it up?

EDIT: spelling


The JIT always runs - there's no interpreter, so if your need to run a method the first thing the VM does is to JIT compile it.

This has some disadvantages - you have to wait for the JIT to run and can't keep running in the interpreter while it compiles, and some advantages - it's simpler.


No, when .NET came out computers had gotten fast enough that the JIT pauses around startup were manageable, so they didn't use interpretation as a stopgap like the JVM did (which was created much earlier).

The statement is false on the face of it. I've learned over past year or so how computers work down to the gates with fragmented understanding of analog that implements them. I've built compilers before, too. The subjects are orthogonal: you can fully understand what a computer is doing without ever understanding how compilers work. It's called assembly language. :)

I think the author really meant to say you have to understand some amount of how computers work and how compilers translate HLL's to them in order to fully understand why your HLL programs behave the way they do. That's the charitable interpretation. I'm actually writing this comment on a laptop sitting on a book that did that for me. It's called Write Great Code by Randall Hyde. Guy who made Art of Assembly Programming and HLA. It was enlightening.


Yeah, I think a more appropriate statement would have been "If you don't know how compilers work, you don't know how software works".

Again, software in most HLL's that uses compilers. There's assembly, macro assemblers, and some HLL's close to the metal. For the later, you can get a list of what various constructs will look like as assembly, heuristics on what to use where, and still don't need to know compilers.

But almost all software today is written in a high level language (above C).

And even in C, optimizations can change your code quite a lot.

So I do think having a basic udnerstanding of how compilers work under the hood should be required for every programmer.

As I said elswhere, every proper programmer should write at least one toy language with a compiler and / or interpreter.


The problem I always have with this kind of article is that even though they, like this one, are often well written and clear they skip all discussion of the choice of language to implement as though this is somehow completely irrelevant to the task of creating an interpreter.

But it is not. An interpreter for a curly bracket language or Algol descendant (Algol, C, C++. C#, VB.Net, Pascal) looks quite different from one for a stack based language (the canonical one being Forth I suppose) and different again from on that uses S-expressions (Lisp, Scheme, etc.), and so on.

A simple interpreter for a stack based language can be a lot shorter than one for a block structured language


Yes, this is a simple expression parser, I transpiled this tutorial to C earlier this year [0]. If by some chance you would like to build my C version, you would need two other libraries that I built too. Compilers is the most interesting CS topic for me. That and language runtimes. If I could go to grad school, that's what I would want to research.

I also emulated Python's `raw_input` function to my own C version :)

It was also a chance for me to use `setjmp` to implement exception handling in C. Take a look at this commit: https://github.com/rmccullagh/calculator/commit/6afac5e5924b...

[0] https://github.com/rmccullagh/calculator


I've been following this tutorial recently and I've enjoyed it quite a bit. I think I understand grammars a little bit more as a result. If this style is interesting to you the author also has a series on building a web server from scratch. https://ruslanspivak.com/lsbaws-part1/

Neat series. Another book to add to the recommendation list is Game Scripting Mastery which is a great text for learning to create scripting languages, compilers, and virtual machines. Don't judge it by the title, it's a gem. https://www.amazon.com/Scripting-Mastery-Premier-Press-Devel...

Java version to play with:

https://codeboard.io/projects/9288

It's interesting to see how much "overhead" static typing and language verbosity generates. But maybe there are also ways to write it more compact in Java, not sure.





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: