[Cluster-summit-list] Memory inversion problem and solution

Daniel Phillips phillips at redhat.com
Thu Jun 16 05:53:57 UTC 2005


Hi all,

I suppose I should put this on the agenda.  After all the easy bugs are 
fixed, cluster filesystems have to grapple with perhaps the worst 
day-one bug remaining in Linux, what I have called "memory inversion 
deadlock".  That is, some component in the IO path of a block device 
needs to allocate memory but memory allocation blocks because the VM is 
busy trying to write out memory to the block device, which is trying to 
allocate memory...

This nasty and subtle problem also manifests in several other common 
forms, and is particularly harmful to clusters, which tend to exercise 
the corner cases where the deadlock manifests.  The good news is, I 
have come up with what appears to be a definitive solution.

Part of the solution entails some stringent rules about how some parts 
of the cluster infrastructure are allowed to operate, and how they must 
be audited.  For example, fencing may not be implemented via shell 
scripts because we cannot prove bounded memory usage for a shell 
script.  The sad truth is that much cluster code infrastructure code is 
going to have to be re-examined and rewritten in light of what has to 
be done to make the memory inversion deadlock problem finally go away.

Anybody want to see this on the agenda?  (Show of hands please.)

Regards,

Daniel




More information about the Cluster-summit-list mailing list