The Story of Squeak


Storage Managment

Apple Smalltalk had achieved good garbage collection behavior with a simple two-generation approach similar to [Unga84]. At startup, and after any full garbage collection (a mark and sweep of the entire image), all surviving objects were considered to be old, and all objects created subsequently (until the next full collection) to be new. All pointer stores were checked and a table maintained of "root" objects—old objects that might contain pointers to new objects. In this way, an incremental mark phase could be achieved by marking all new objects reachable from these roots and sweeping the new object area; unmarked new objects were garbage. Compaction was simple in that system, owing to its use of an object table. Full garbage collection was triggered either by an overflow of the roots table, or by failure of an incremental collection to reclaim a significant amount of space. That system was known to run acceptably with less than 500k of free space and to perform incremental reclamations in under 250 milliseconds on hardware of the 80's (16MHz 68020).

For Squeak, we determined to apply the same approach to our new system of 32-bit direct pointers. We were faced immediately with a number of challenges. First, we had to write an in-place mark phase capable of dealing with our variable-length headers, including those that did not have an actual class pointer in them. Then there was the need to produce a structure for remapping object pointers during compaction, since we did not have the convenient indirection of an object table. Finally there was the challenge of rectifying all the object pointers in memory within an acceptable time.

The remapping of object pointers was accomplished by building a number of relocation blocks down from the unused end of memory. A thousand such blocks are reserved outside the object heap, ensuring that at least one thousand objects can be moved even when there is very little free space. However, if the object heap ends with a free block, that space is also used for relocation blocks. If there is not enough room for the number of relocation blocks needed to do compaction in a single pass (almost never), then the compaction may be done in multiple passes. Each pass generates free space at the end of the object heap which can then be used to create additional relocation blocks for the next pass.

One more issue remained to be dealt with, and that was support of the become operation without an object table. (The Smalltalk become primitive atomically exchanges the identity of two objects; to Smalltalk code, each object appears to turn into, or "become," the other.) With an object table, the become primitive simply exchanges the contents of two object table entries. Without an object table, it requires a full scan of memory to replace every pointer to one object with a pointer to the other. Since full memory scans are relatively costly, we made two changes. First, we eliminated most uses of become in the Squeak image by changing certain collection classes to store their elements in separate Array objects instead of indexed fields. However, become operations are essential when adding an instance variable to a class with extant instances, as each instance must mutate into a larger object to accommodate the new variable. So, our second change was to restructure the primitive to one that exchanges the identity of many objects at once. This allows all the instances of a class to be mutated in a single pass through memory. The code for this operation uses the same technique and, in fact, the very same code, as that used to rectify pointers after compaction.

We originally sought to minimize compaction frequency, owing to the overhead associated with rectifying direct addresses. Our strategy was to do a fast mark and sweep, returning objects to a number of free lists, depending on size. Only when memory became overly fragmented would we do a consolidating compaction.

As we studied and optimized the Squeak garbage collector, however, we were able radically to simplify this approach. Since an incremental reclamation only compacts the new object space, it is only necessary to rectify the surviving new objects and any old objects that point to them. The latter are exactly those objects marked as root objects. Since there are typically just a few root objects and not many survivors (most objects die young), we discovered that compaction after an incremental reclamation could be done quickly. In fact, due to the overhead of managing free lists, it turned out to be more efficient to compact after every incremental reclamation and eliminate free lists altogether. This was especially gratifying since issues of fragmentation and coalescing had been a burden in design, analysis, and coding strategy.

Two policy refinements reduced the incremental garbage collection pauses to the point where Squeak became usable for real-time applications such as music and animation. First, a counter is incremented each time an object is allocated. When this counter reaches some threshold, an incremental collection is done even if there is plenty of free space left. This reduces the number of new objects that must be scanned in the sweep phase, and also limits the number of surviving objects. By doing a little work often, each incremental collection completes quickly, typically in 5-8 milliseconds. This is within the timing tolerance of even a fairly demanding musician or animator.

The second refinement is to tenure all surviving objects when the number of survivors exceeds a certain threshold, a simplified version of Ungar and Jackson's feedback-mediated tenuring policy [UnJa88]. Tenuring is done as follows. After the incremental garbage collection and compaction, the boundary between the old and new object spaces is moved up to encompass all surviving new objects, as if a full garbage collection had just been done. This "clears the decks" so that future incremental compactions have fewer objects to process. Although in theory this approach could hasten the onset of the next full garbage collection, such full collections are rare in practice. In any case, Squeak's relatively lean image makes full garbage collections less daunting than they might be in a larger system; a full collection typically takes only 250 milliseconds in Squeak. We have been using this storage manager in support of real-time graphics and music for over a year now with extremely satisfactory results. In our experience, 10 milliseconds is an important threshold for latency in interactive systems, because most of the other critical functions such as mouse polling, sound buffer output and display refresh take place at a commensurate rate.