Friday, December 16, 2016

the malloc challenge - progress update #1

Hi, and welcome to this progress update from the malloc challenge. If you didn't get the original memo, I recommend starting here. Discussions and submissions are directed to the posts on Reddit and Hacker News.


The challenge got an overwhelming response, leading to many interesting discussions. I'm happy to see that so many are still awake out there, warms my heart.

Some unfortunately missed the point I'm trying to make completely; and got tangled up in giving reasons why writing a system level, general purpose malloc is worse than blindfolded rocket surgery. I know, that is why I think it's about time we put perfect back on the shelf and shift focus to solving specific problems in a good enough, modular, application level framework.

Others were confused by the word 'challenge' and assumed I was announcing a competition, without any prices. Some went on to stroke their egos, before sneaking back into security with arbitrary excuses. This is where competing leads us, constant fight or flight mode where substance is replaced with drama. Even science recognizes we're all part of one whole; you loose, I loose, everyone looses.


The benchmark has been updated to use the same random seed for each allocator; to additionally allocate several blocks of the same sizes; and to access the allocated memory, causing access errors for invalid allocations.


Implementation specific freeing functions have been replaced with a 'free' method in the c4malloc interface. 'acquire' has been shortened to 'acq' and 'release' to 'rel'.


An mmap allocator has been added, it allocates each request using anonymous mmap to avoid malloc's internal book keeping costs. The slab allocator and free list have seen plenty of experimentation with indexing slabs and allocations by size, but finally reverted to using linked lists. Hours of full stack optimization to chisel libc4life's binary set into a viable alternative still resulted in a 300% slowdown, and I can't think of a faster way to do it right now.


The benchmark improvements dissolved most performance oddities, all allocators now exhibit the expected relative performance. The short story is that slab allocation via mmap performs best, significantly better than straight malloc because of reduced number of actual allocations and lack of book keeping; this relation holds when stacking pools or free lists on top.


I still haven't given up hope on external contributions. The repository has seen plenty of forks, but nothing crossed my radar yet. Please share your ideas, however naive or imperfect they may seem to you. This is not a competition, you have nothing to loose and everything to gain.

peace, out