Hi, On Thu, 2012-11-08 at 14:59 -0500, David Teigland wrote: > On Thu, Nov 08, 2012 at 06:48:19PM +0000, Steven Whitehouse wrote: > > > Converting to NL would actually be less expensive than unlock because the > > > NL convert does not involve a reply message, but unlock does. > > > > > I'm not entirely sure I follow... at least from the filesystem point of > > view (and without your proposed change) both conversions and unlocks > > result in a reply. Is this a dlm internal reply perhaps? > > Right, I was refering to the internal dlm reply over the network. > > > > So skipping the unlocks is a first step that gives us a big benefit very > > > simply. To benefit even further, we could later look into skipping the > > > "convert to NL" step also, and just abandoning the dlm locks in whatever > > > mode they're in; but that's probably not as simple a change. > > > > > > > Yes, thats true... the issue is that the glock state machine treats all > > glocks on an individual basis, and the demotion to NL also deals with > > any writing back and invalidating of the cache thats required at the > > same time. So that makes it tricky to separate from the requests to the > > dlm. > > > > That said, I'd like to be able to move towards dealing with batches of > > glocks in the future, since that means we can provide a more favourable > > ordering of i/o requests. That is not an easy thing to do though. > > > > In addition to the benefit for umount, I'm also wondering whether, if > > these unlocks are relatively slow, we should look at what happens during > > normal operation, where we do from time to time, send unlock requests. > > Those are mostly (though indirectly) in response to memory pressure. Is > > there anything we can do there to speed things up I wonder? > > The main thing would be to not use a completion callback for dlm_unlock > (either make dlm not send one, or ignore it in gfs2). This would let you > free the glock memory right away. > > But, removing unlock completions can create new problems, because you'd > need to handle new errors from dlm_lock() when it ran up against an > incomplete unlock. Dealing with that complication may negate any benefit > from ignoring unlock completions. Unless, of course, you knew you > wouldn't be making any more dlm_lock calls on that lock, e.g. during > unmount. > Yes, I'm all for keeping things simple if we can. I thought it might be an interesting exercise to do a few measurements in this area. So I wrote a script to process the output of the tracepoints in order to get some figures to see where things stand at the moment. To match the timings of the DLM lock times with state changes, I used three tracepoints: gfs2_glock_lock_time, gfs2_glock_state_change and gfs2_glock_put. I found that I had to up the value of /sys/kernel/debug/tracing/buffer_size_kb quite a lot in order to avoid having dropped tracepoints during the test. My test was really simple: Mount a single node gfs2 using dlm, run postmark with 100000 transactions and files and unmount. I processed the results through the script and got the graphs which I've attached. The two graphs are actually very similar to each other. The first is a histogram (with log sized buckets, and points plotted at the mid-point of each bucket) and the second is basically the same, but with each bucket's count multiplied by the mid-point value of that bucket, so that it represents the total time in seconds taken by the dlm requests falling into that bucket. I've included every state change which occurred during the test. Although I can separate out final lock puts (i.e. NL to UN) the stats can't separate out initial lock requests from conversions, so NL:EX (i.e. NL to EX state change) may also include UN:EX for example. So at the end of all this the two state changes which stick out as taking longer than the others are NL:UN and EX:NL. Unfortunately it doesn't tell us why that is... it may just be that these conversions were requested when the DLM was especially busy, or when something else was going on which slowed down the granting. I suspect though that most (all?) of the NL:UN conversions were during umount, since this workload doesn't generate any noticeable memory pressure, so that it is unlikely that cached glocks are being ejected during the run. That may well also be the case for the EX locks too - since they will have been cached from the point that the objects in question requested an EX lock as there are no other nodes requesting demotions. It would be interesting to see what the results look like from other (more realistic) workloads, Steve.
Description: Adobe PDF document
Description: Adobe PDF document