[Cluster-devel] [RFC][PATCH] dlm: Reset fs_notified when check_fs_done
Jiaju Zhang
jjzhang.linux at gmail.com
Sun Feb 28 22:06:32 UTC 2010
Hi,
About the issue that dlm_controld and fs_controld sit spinning,
retrying and replying for the fs_notified check, I have a suspision
that another scenario may also hit that logic:
If the node->fs_notified has been set to 1 by previous change, when a
new change comes and needs to check the node->fs_notified, because it
has not been reset to 0, so check_fs_done will succeed even if
dlm_controld has not received the notification from fs_controld this
time.
For example, given that the following membership changes n, n+1, n+2,
we see what happens on node X:
Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason,
eventually in node X's ls->node_history, node Y's fs_notified
= 1
Step 2: cg n+1: node Y joins ...
Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one
possible scenario is: before fs_controld's notification
arrives, dlm_controld has known node Y is down from CPG
message and done a lot of work, and it saw node Y's
fs_notified = 1 (been set in Step 1) then passed the fs check
wrongly. So node Y's check_fs reset to 0.
Step 4: fs_controld's notification arrives, it sees node Y's check_fs
= 0 and assumes dlm_controld has not known node Y is down and
retries to send the notification. But in fact, dlm_controld
has already known this and finished all the work, which will
result in the spinning ...
I'm not sure if I read the code correctly :-) Below is the patch which
reset the node->fs_notified. Review and comments are highly
appreciated!
Thanks,
Jiaju
Signed-off-by: Jiaju Zhang <jjzhang.linux at gmail.com>
---
group/dlm_controld/cpg.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c
index d5245ce..b257595 100644
--- a/group/dlm_controld/cpg.c
+++ b/group/dlm_controld/cpg.c
@@ -636,6 +636,7 @@ static int check_fs_done(struct lockspace *ls)
if (node->fs_notified) {
node->check_fs = 0;
+ node->fs_notified = 0;
} else {
log_group(ls, "check_fs nodeid %d needs fs notify",
node->nodeid);
More information about the Cluster-devel
mailing list