Vulcan - Non-Blocking Memory Manager

From Jim Starkey on the Firebird Development List 28th November 2004

The Vulcan memory manager (MemMgr.cpp) handles large and small blocks differently. Large blocks are allocated on a best fit basis, and are recombined with other free blocks on release. Small blocks, however, are cached by (rounded up) size for re-use. Like the original InterBase memory manager, the cache is managed with a vector of list heads indexed by block size divided by rounding size. In the past, the memory manager data structures have been protected by a mutex.

It occurred to me that the small block free block vector could be managed an interlocked compare-exchange instructions so re-allocating an available small block or releasing a small block would not require use of the mutex. This has two huge advantages. First, it is much faster, requiring a single interlocked instruction, where the fastest possible mutex implementation requires at least two. Second, it is non-blocking, meaning that a thread allocating or releasing a small block is never blocked and never blocks another thread. Since the overwhelming majority of memory allocations are for small blocks (< 4K), this has a large effect on single cpu performance and a huge effect of SMP scalability.

For reference, my memory manager testbed uses a 400K series of allocations/releases captured from Netfrastructure. The testbed computes the CPU time to execute the complete sequence 10 times. Running on a 1.3GHz Athlon, WinXP, debug compile, the Firebird 2.0 memory allocator runs in 6.2 seconds, the previous Vulcan memory manager in about 1.4 seconds, and the new memory manager in about 1.1 seconds.

Compiled for release, the Firebird 2.0 memory manager runs in about 3.6 seconds and the new Vulcan memory manager in .7 seconds. This translates to an average memory allocate and release cycle of 460 nanoseconds. Since the test run include many large block allocations and initial small block allocations, the effective average memory allocate and release cycle within the long running server process will be below this number.

As an aside, this is another coffin nail in legacy Firebird pool architecture. Since Firebird doesn't normally release blocks, pool specific allocations are all initial allocations that incur the full mutex acquire and release cost. With the new memory manager, it is much cheaper to allocate and release a block from a shared pool than to just allocate it from a private pool.

I didn't expect to run into a situation where releasing memory is a CPU performance enhancement, but here it is. Given infinite memory, it is cheaper to reuse memory than to allocate memory and abandon it.