The Story of Squeak


Smalltalk to C Translation

We have alluded to the Squeak philosophy of writing everything in Smalltalk. While the Blue Book contains a Smalltalk description of the virtual machine that was actually executed at least once to verify its accuracy, this description was meant to be used only as an explanatory model, not as the source code of a working implementation. In contrast, we needed source code that could be translated into C to produce a reliable and efficient virtual machine.

Our bootstrapping strategy also depended on being able to debug the Smalltalk code for the Squeak virtual machine by running it under an existing Smalltalk implementation, and this approach was highly successful. Being able to use the powerful tools of the Smalltalk environment saved us weeks of tedious debugging with a C debugger. However, useful as it is for debugging, the Squeak virtual machine running on top of Smalltalk is orders of magnitude too slow for useful work: running in Squeak itself, the Smalltalk version of the Squeak virtual machine is roughly 450 times slower than the C version. Even running in the fastest available commercial Smalltalk, the Squeak virtual machine running in Smalltalk would still be sluggish.

The key to both practical performance and portability is to translate the Smalltalk description of the virtual machine into C. To be able to do this translation without having to emulate all of Smalltalk in the C runtime system, the virtual machine was written in a subset of Smalltalk that maps directly onto C constructs. This subset excludes blocks (except to describe a few control structures), message sending, and even objects! Methods of the interpreter classes are mapped to C functions and instance variables are mapped to global variables. For byte code and primitive dispatches, the special message dispatchOn:in: is mapped to a C switch statement. (When running in Smalltalk, this construct works by perform:-ing the message selector at the specified index in a case array; since a method invocation is much less efficient than a branch operation, this dispatch is one of the main reasons that the interpreter runs so much faster when translated to C).

The translator first translates Smalltalk into parse trees, then uses a simple table-lookup scheme to generate C code from these parse trees. There are only 42 transformation rules, as shown in Table 3. Four of these are for integer operations that more closely match those of the underlying hardware, such as unsigned shifts, and the last three are macros for operations so heavily used that they should always be inlined. All translated code accesses memory through six C macros that read and write individual bytes, 4-byte words, and 8-byte floats. In the early stages of development, every such reference was checked against the bounds of object memory.

& | and: or: not
+ - * // \\ min: max:
bitAnd: bitOr: bitXor: bitShift:
< <= = > >= ~= ==
isNil notNil
whileTrue: whileFalse: to:do: to:by:do:
ifTrue: ifFalse: ifTrue:ifFalse: ifFalse:ifTrue:
at: at:put:
<< >> bitInvert32 preIncrement integerValueOf:
integerObjectOf: isIntegerObject: 

Table 3: Operations of primitive Smalltalk

Our first translator yielded a two orders of magnitude speedup relative to the Smalltalk simulation, producing a system that was immediately usable. However, one further refinement to the translator yielded a significant additional speedup: inlining. Inlining allows the source code of the virtual machine to be factored into many small, precisely defined methods—thus increasing code-sharing and simplifying debugging—without paying the penalty in extra procedure calls. Inlining is also used to move the byte code service routines into the interpreter byte code dispatch loop, which both reduces byte code dispatch overhead and allows the most critical VM state to be kept in fast, register-based local variables. All told, inlining increases VM performance by a factor of 3.4 while increasing the overall code size of the virtual machine by only 13%.