The Story of Squeak


The Object Memory

The design of an object memory that is general and yet compact is not simple. We all agreed immediately on a number of parameters, though. For efficiency and scalability to large projects, we wanted a 32-bit address space with direct pointers (i.e., a system in which an object reference is just the address of that object in memory). The design had to support all object formats of our existing Smalltalk. It must be amenable to incremental garbage collection and compaction. Finally, it must be able to support the "become" operation (exchange identity of two objects) to the degree required in normal Smalltalk system operation. While anyone would agree that objects should be stored compactly, every object in Smalltalk requires the following "overhead" information:

A simple approach would be to allocate three full 32-bit words as the header to every object. However, in a system of 40k objects, this cavalier expenditure of 500k bytes of memory could make the difference between an undeployable prototype and a practical application. Therefore, we designed a variable-length header format which seldom requires more than a single 32-bit word of header information per object. The format is given in Tables 1 and 2.

offset contents occurrence
-8 size in words (30 bits), header type (2 bits) 1%
-4 full class pointer (30 bits), header type (2 bits) 18%
0 base header, as follows...
storage management (3 bits)
object hash (12 bits)
compact class index (5 bits)
object format field (4 bits, see below)
size in words (6 bits)
header type (2 bits)
100%

Table 1: Format of a Squeak object header
0 no fields
1 fixed pointer fields
2 indexable pointer fields
3 both fixed and indexable pointer fields
4 unused
5 unused
6 indexable word fields (no pointers)
7 unused
8-11 indexable byte fields (no pointers):
low 2 bits are low 2 bits of size in bytes
12-15 compiled methods: low 2 bits are low 2 bits of size in bytes.
The number of literals is specified in method header, followed by the indexable bytes that store byte codes.

Table 2: Encoding of the object format field in a Squeak object header

Our design is based on the fact that most objects in a typical Smalltalk image are small instances of a relatively small number of classes. The 5-bit compact class index field, if non-zero, is an index into a table of up to 31 classes that are designated as having compact instances; the programmer can change which classes these are. The 6-bit size field, if non-zero, specifies the size of the object in words, accommodating sizes up to 256 bytes (i.e., 64 words, with the additional 2 bits needed to resolve the length of byte-indexable objects encoded in the format field). With only 12 classes designated as compact in the 1.18 Squeak release, around 81% of the objects have only this single word of overhead. Most of the rest need one additional word to store a full class pointer. Only a few remaining objects (1%) are large enough to require a third header word to encode their size, and this extra word of overhead is a tiny fraction of their size.