[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Cluster-devel] cluster/dlm-kernel/src lockqueue.c



CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL46
Changes by:	teigland sourceware org	2007-11-07 15:57:09

Modified files:
	dlm-kernel/src : lockqueue.c 

Log message:
	bz 349001
	
	For the entire life of the dlm, there's been an annoying issue that we've
	worked around and not "fixed" directly.  It's the source of all these
	messages:
	
	process_lockqueue_reply id 2c0224 state 0
	
	The problem that a lock master sends an async "granted" message for a
	convert request *before* actually sending the reply for the original
	convert.  The work-around is that the requesting node just takes the
	granted message as an implicit reply to the conversion and ignores the
	convert reply when it arrives later (the message above is printed when
	it gets the out-of-order reply for its convert).  Apart from the annoying
	messages, it's never been a problem.
	
	Now we've found a case where it's a real problem:
	
	1. nodeA: send convert PR->CW to nodeB
	nodeB: send granted message to nodeA
	nodeB: send convert reply to nodeA
	2. nodeA: receive granted message for conversion
	complete request, sending ast to gfs
	3. nodeA: send convert CW->EX to nodeB
	4. nodeA: receive reply for convert in step 1, which we ordinarily
	ignore, but since another convert has been sent, we mistake this
	message as the reply for the convert in step 3, and complete
	the convert request which is *not* really completed yet
	5. nodeA: send unlock to nodeB
	nodeB: complains about an unlock during a conversion
	
	The fix is to have nodeB not send a convert reply if it has already sent a
	granted message.  (We already do this for cases where the conversion is
	granted when first processing it, but we don't in cases where the grant
	is done after processing the convert.)

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/dlm-kernel/src/lockqueue.c.diff?cvsroot=cluster&only_with_tag=RHEL46&r1=1.37.2.9&r2=1.37.2.9.6.1

--- cluster/dlm-kernel/src/Attic/lockqueue.c	2006/01/24 14:38:19	1.37.2.9
+++ cluster/dlm-kernel/src/Attic/lockqueue.c	2007/11/07 15:57:08	1.37.2.9.6.1
@@ -590,6 +590,14 @@
 	req->rr_lvbseq = lkb->lkb_lvbseq;
 	add_request_lvb(lkb, req);
 
+	/* prevent a convert reply that hasn't been sent yet, the grant message
+	   will serve as an implicit convert reply */
+	if (lkb->lkb_request) {
+		log_debug(lkb->lkb_resource->res_ls, "skip convert reply %x "
+			  "gr %d\n", lkb->lkb_id, lkb->lkb_grmode);
+		lkb->lkb_request = NULL;
+	}
+
 	midcomms_send_buffer(&req->rr_header, e);
 }
 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]