[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Cluster-devel] [ipmitool] avoiding assertion hit in wrong state transition



Hi, 

I'm a GPS consultant at Tokyo offfice.
I'm mailing to you because I think you are the package
maintainer of ipmitool. 

My customer had a trouble with ipmitool.  As the result sometimes
fencing mechanism of RHCS deployed by my customer didn't work.
(I already asked the customer to open a support ticket.)

During inspecting the customer's trouble I found something 
strange in ipmilan.c. I've tried to write a patch to avoid 
the trouble. Generally I think I should submit such kind of
patch to the community project or bugzilla. However, I don't
have the access to hardware which can reproduce the trouble.
So the patch is based on just code reading. I'd like you to
evaluate/review the patch.

My customer used a hardware which talks IPMI SLOWLY.
OpenIPMI-2.0.16-11.el5 is used.

The trouble:
------------------------------------------------------------------
See the following captured data. 

No.  Time                    Source     Destination   Protocol Length Info
1    18:22:19.896507 172.24.130.217   192.168.0.8     IPMI/ATCA 65     Req, Get Channel Authentication Capabilities, seq 0x00
2    18:22:19.900510 192.168.0.8      172.24.130.217  IPMI/ATCA 72     Rsp, Get Channel Authentication Capabilities, seq 0x00
3    18:22:19.900896 172.24.130.217   192.168.0.8     RMCP+    90     Session ID 0x0, payload type: RMCP+ Open Session Request
4    18:22:19.903996 192.168.0.8      172.24.130.217  RMCP+    94     Session ID 0x0, payload type: RMCP+ Open Session Response
5    18:22:19.905131 172.24.130.217   192.168.0.8     RMCP+    99     Session ID 0x0, payload type: RAKP Message 1
6    18:22:19.980600 192.168.0.8      172.24.130.217  RMCP+    118    Session ID 0x0, payload type: RAKP Message 2
7    18:22:19.981055 172.24.130.217   192.168.0.8     RMCP+    86     Session ID 0x0, payload type: RAKP Message 3
8    18:22:30.675843 192.168.0.8      172.24.130.217  RMCP+    78     Session ID 0x0, payload type: RAKP Message 4
9    18:22:30.675928 172.24.130.217   192.168.0.8     ICMP     106    Destination unreachable (Port unreachable)

It is taken during ipmitool 
getting the status from managed system(192.168.0.8)
"Destination unreachable" is returned from the managed system
to the host which runs fenced because Datagram No.8 delayed.
Host(172.24.130.217) on which fenced run expected it receives
"RAKP Message 4" datagram in soon after sending "RAKP Message 3"
(Datagram No.7). However, "RAKP Message 4" arrived after more than
10 seconds. It was too late; ipmitool process exited.
Sometimes this was reproduced(not always).

The Strangeness
------------------------------------------------------------------
172.24.130.217 didn't retransmit RAKP Message 3. As far as reading
ipmi_lanplus_send_payload in lanplus.c, 172.24.130.217 should resend
it 4 times. But no retransmission. 

My analysis
------------------------------------------------------------------
Instead of using fencing mechanism of RHCS, my customer run ipmitool
directly; and it exited abnormally.

# /usr/bin/ipmitool -I lanplus -H '192.168.0.6' -U 'Administrator' -P '*' -v chassis power on

ipmitool: lanplus.c:2153: ipmi_lanplus_send_payload: Assertion `session->v2_data.session_state == LANPLUS_STATE_PRESESSION' failed.

It seems that this assertion hit is not avoidable when retransmitting
datagram with IPMI_PAYLOAD_TYPE_RMCP_OPEN_REQUEST, 
IPMI_PAYLOAD_TYPE_RAKP_1, or IPMI_PAYLOAD_TYPE_RAKP_2 typed payload.

After initial sending those type of datagram, the state of session is
transited with following code:

		/* Remember our connection state */
		switch (payload->payload_type)
		{
		case IPMI_PAYLOAD_TYPE_RMCP_OPEN_REQUEST:
			session->v2_data.session_state = LANPLUS_STATE_OPEN_SESSION_SENT;
			break;
		case IPMI_PAYLOAD_TYPE_RAKP_1:
			session->v2_data.session_state = LANPLUS_STATE_RAKP_1_SENT;
			break;
		case IPMI_PAYLOAD_TYPE_RAKP_3:
			session->v2_data.session_state = LANPLUS_STATE_RAKP_3_SENT;
			break;
		}

And when retransmitting following assert statements which checks the session_state
may be hit.

			else if (payload->payload_type == IPMI_PAYLOAD_TYPE_RMCP_OPEN_REQUEST)
			{
				lprintf(LOG_DEBUG, ">> SENDING AN OPEN SESSION REQUEST\n");
				assert(session->v2_data.session_state == LANPLUS_STATE_PRESESSION);

			else if (payload->payload_type == IPMI_PAYLOAD_TYPE_RAKP_1)
			{
				lprintf(LOG_DEBUG, ">> SENDING A RAKP 1 MESSAGE\n");
				assert(session->v2_data.session_state ==
						 LANPLUS_STATE_OPEN_SESSION_RECEIEVED);

			else if (payload->payload_type == IPMI_PAYLOAD_TYPE_RAKP_3)
			{
				lprintf(LOG_DEBUG, ">> SENDING A RAKP 3 MESSAGE\n");
				assert(session->v2_data.session_state ==
						 LANPLUS_STATE_RAKP_2_RECEIVED);


Am I missing something?


My patch
------------------------------------------------------------------
The patch is for 1.8.12-5.fc18. The intent is obvious.
In `last_state' the state before transition is recorded.
last_state is restored when retransmitting is needed.

--- lanplus.c.orig	2012-10-26 14:31:24.131945775 +0900
+++ lanplus.c	2012-10-26 14:35:59.107128857 +0900
@@ -2087,6 +2087,7 @@
 	int                   try = 0;
 	int                   xmit = 1;
 	time_t                ltime;
+	enum LANPLUS_SESSION_STATE last_state;
 
 	if (!intf->opened && intf->open && intf->open(intf) < 0)
 		return NULL;
@@ -2221,6 +2222,7 @@
 		usleep(100); 			/* Not sure what this is for */
 
 		/* Remember our connection state */
+		last_state = session->v2_data.session_state;
 		switch (payload->payload_type)
 		{
 		case IPMI_PAYLOAD_TYPE_RMCP_OPEN_REQUEST:
@@ -2280,6 +2282,15 @@
 		if (xmit) {
 			/* incremet session timeout each retry */
 			intf->session->timeout++;
+
+			/* Roll back the state transition */
+			switch (payload->payload_type) {
+			case IPMI_PAYLOAD_TYPE_RMCP_OPEN_REQUEST:
+			case IPMI_PAYLOAD_TYPE_RAKP_1:
+			case IPMI_PAYLOAD_TYPE_RAKP_3:
+				last_state = session->v2_data.session_state;
+				break;
+			}		
 		}
 
 		try++;


Thanks in advance.
Masatake YAMATO


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]