The VM bug that delayed Linux-2.6.34-rc4

Arguably, the most complex area in Operating Systems is – memory management. i.e, managing the Virtual Memory (VM).

Linux-2.6.34-rc3 had been out on March 30 and after nearly 2 weeks, -rc4 has not yet appeared.And the reason? Well, we are going to look at it.

The anon_vma scalability patches submitted by Rik Van Riel was merged in the -rc1 phase. Borislav Petkov using -rc3 has hit on a bug causing crash while suspending to disk(hibernate). Linus chimed in suspecting this could be caused by the new scalable anon_vma linking code by Rik. The bug usually appears under severe memory pressure – Borislav explains that the procedure to consistently trigger the bug is to run 3 KVM guests, open firefox and load a huge html file, and try to s2disk – kaboom! Though he himself doubted that this could be a hardware issue since not many people observed it, Linus refused to agree with that because he himself has seen a similar OOPS in the Mac Mini his kids are using. So it is likely a real bug which needed to be identified and fixed. And the anon_vma code is very complex with various levels of locking and RCU usages, Linus wants to simplify mm/rmap.c considerably.

So the bug hunting began by Linus Torvalds and Rik Van Riel, joined by Johannes Wiener, Kosaki Motohiro and Minchan Kim – and every patch being tested by Borislav Petkov. After 10 days of debugging, flying many patches around, {,in}validating various theories, finding and fixing 3 other independent bugs (1, 2, 3) in the VM area (though second one may not be required), Linus came up with a new theory which he explains along with small patch. And, Borislav confirmed that his Netbook just survived more than 20 suspend cycles even under severe memory pressure.

Had the bug not been isolated and fixed, Linus was planning to revert the whole anon_vma scalability patches, which didn’t sound good and that they’d drop their effort to fix it even when feeling so close to fixing it, didn’t sound good either. The whole 4 patches can now be found here – 1, 2, 3, 4. And with that, -rc4 is out in the wild.

Awesome.

Update: The-as-usual-excellent LWN.net article:http://lwn.net/Articles/383162/

About these ads

6 thoughts on “The VM bug that delayed Linux-2.6.34-rc4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s