Chris Baus

Dynamic memory failure on Linux is fatal

It is difficult to recover from dynamic memory allocation failure, when assuming a typical scheme where objects are allocated and deleted on demand. If malloc fails, your app is in for a heap of trouble (pun intended). When the OS is in a limited memory situation, it could start behaving unpredictably with possible live-lock and stalling.

This is why I pre-allocate my memory in the SwitchFlow Reverse Proxy. I don't want to try recover when the system is running so low on resources that I can't alloc my state machine data when handling a new connection. I prefer it if admins know what memory requirements they are up against at server startup.

Very few C++ application programmers, including myself, have any idea what happens when operator new throws in their app, because we don't write custom new operators that purposely fail during testing. Although, I do know what happens in my proxy server. It never starts.

But the issue is way more complicated, as both Linux and FreeBSD overcommit memory by default. What that means is malloc will not fail even if there is not enough physical memory available to meet the request. Memory pages aren't physically committed until they are written to. If applications start committing more pages then are available to the OS, the OS starts killing off processes. There is no chance to catch an exception from new, because more then likely your application is already dead before this happens. It blows a lot of developers minds when they learn this after years of assuming an academic implementation of C++.

# man malloc
...By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. This is a really bad bug. In case it turns out that the system is out of memory, one or more processes will be killed by the infamous OOM killer.

In this classic rant, Linus didn't see this as a bug when PC resources were constrained. He thought 17mb of swap was a lot at the time. By comparison, I have a 1gig of swap on the machine I'm typing this on.

He did back down in the 2.6 kernel, which now offers a non-overcommitted mode. This is set into a running kernel with:
echo 2 > /proc/sys/vm/overcommit_memory

This is still somewhat of a heuristic as the OS does allow arbitrary file mappings through mmap(). I'm considering setting this option when the server starts, as I can then make better guarantees about its behavior.

It is worth noting that NT (I can't call the Window's kernel anything other than NT), has a drastically different VM architecture which I call a two phase allocation model. Memory address space can be reserved by applications and latter the physical memory can be committed using the VirtualAlloc() API. Most C++ heap implementations use NT's built in heap implementation which combines both stages, so if you use the standard new operator you would never know that you've really performed two orthogonal operations.

This is going to sound like heresy, but I think most application developers are fine to ignore the possibility that new will throw. Recovering from such situations is nearly impossible anyway, as the process that is responsible for depleting memory is probably allocating it as fast as your app is freeing it. Plus if you happen to throw again from an exception handler (for example by allocating a string in the exception), you are going straight terminate anyway. If you don't think your app can tolerate failure in low memory situations, you should consider an alternate memory management strategy.

For high availability systems, single purpose Virtual Machines like those provided VMWare are the future. It might sound like my server is performing a pretty aggressive memory strategy by pre-allocating and using non-overcommited memory, but if you assume it is in its own little world, where no other process really matter, then it isn't so bad. It won't butt heads with the web server running in a separate VM. Plus if you look at how VMWare works, it gives the admin very precise control over the amount of memory available, so the admin could say, "I'm giving the proxy VM 2gigs of RAM," and as an application developer I might as well take all of it.

Update: MPH convinced me to rethink my statements on mlockall(). You can read the comments for more discussion on that.

Update: Paul linked to my memory management article from the Apache mailing list. Thanks.

There was some discussion on setrlimit(). There are a couple problems here. setrlimit doesn't do you a lot of good if you set it to 2 gigs and there are only 50k left in the system. You're still fodder for the OOM-killer (and maybe mysqld is too, which is what happened to me during some testing today).

Plus read the man page closely.

Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available).
Red lights should go off here, because stack exceptions fall in the "you're totally screwed" category. There is a chance that your next call to please_help_im_out_of_memory(); is going to blow the stack. Not good. If you want to something sane about OOM, hunk off a big piece at startup, memset it, and if you are still alive, pool it. Unfortunately there is always the possibility that growing the stack could fail to commit memory which will cause the program to fail.

NT does handle OOM with some grace because of their two phase allocation architecture. This may have hurt NT's performance when systems had 8megs of RAM, but with 2gigs it looks pretty good. Don't underestimate Dave Cutler. There's some genius in there. But why couldn't he give us monitors? Well you can't win them all.