[rhos-list] EXTERNAL: Re: Red Hat Linux VM freezes.

Thu May 30 22:47:43 UTC 2013

Message content bottom posted, look at the bottom.

> -----Original Message-----
> From: rhos-list-bounces at redhat.com [mailto:rhos-list-bounces at redhat.com] On
> Behalf Of Sudhir R Venkatesalu
> Sent: Wednesday, May 29, 2013 11:51 PM
> To: Steven Dake; Minton, Rich
> Cc: rhos-list at redhat.com
> Subject: Re: [rhos-list] EXTERNAL: Re: Red Hat Linux VM freezes.
> 
> Hello,
> 
> I am copy pasting this from openstack operation guide. Please read it, it might
> help you to solve your issue.
> 
> Double VLAN
> I was on-site in Kelowna, British Columbia, Canada setting up a new OpenStack
> cloud.
> The deployment was fully automated: Cobbler deployed the OS on the bare
> metal, bootstrapped it, and Puppet took over from there. I had run the
> deployment scenario so many times in practice and took for granted that
> everything was working.
> On my last day in Kelowna, I was in a conference call from my hotel. In the
> background, I was fooling around on the new cloud. I launched an instance and
> logged in. Everything looked fine. Out of boredom, I ran ps aux and all of the
> sudden the instance locked up.
> Thinking it was just a one-off issue, I terminated the instance and launched a
> new one. By then, the conference call ended and I was off to the data center.
> At the data center, I was finishing up some tasks and remembered the lock-up. I
> logged into the new instance and ran ps aux again. It worked. Phew. I decided to
> run it one more time. It locked up. WTF.
> After reproducing the problem several times, I came to the unfortunate
> conclusion that this cloud did indeed have a problem. Even worse, my time was
> up in Kelowna and I had to return back to Calgary.
> Where do you even begin troubleshooting something like this? An instance just
> randomly locks when a command is issued. Is it the image? Nope - it happens on
> all images. Is it the compute node? Nope - all nodes. Is the instance locked up?
> No! New SSH connections work just fine!
> We reached out for help. A networking engineer suggested it was an MTU issue.
> Great!
> MTU! Something to go on! What's MTU and why would it cause a problem?
> MTU is maximum transmission unit. It specifies the maximum number of bytes
> that the interface accepts for each packet. If two interfaces have two different
> MTUs, bytes might OpenStack Operations Guide May 15, 2013
> 133
> get chopped off and weird things happen -- such as random session lockups. It's
> important to note that not all packets have a size of 1500. Running the ls
> command over SSH might only create a single packets less than 1500 bytes.
> However, running a command with heavy output, such as ps aux requires several
> packets of 1500 bytes.
> OK, so where is the MTU issue coming from? Why haven't we seen this in any
> other deployment? What's new in this situation? Well, new data center, new
> uplink, new switches, new model of switches, new servers, first time using this
> model of servers... so, basically everything was new. Wonderful. We toyed
> around with raising the MTU at various
> areas: the switches, the NICs on the compute nodes, the virtual NICs in the
> instances, we even had the data center raise the MTU for our uplink interface.
> Some changes worked, some didn't. This line of troubleshooting didn't feel right,
> though. We shouldn't have to be changing the MTU in these areas.
> As a last resort, our network admin (Alvaro) and myself sat down with four
> terminal windows, a pencil, and a piece of paper. In one window, we ran ping. In
> the second window, we ran tcpdump on the cloud controller. In the third,
> tcpdump on the compute node. And the forth had tcpdump on the instance. For
> background, this cloud was a multinode, non-multi-host setup.
> There was one cloud controller that acted as a gateway to all compute nodes.
> VlanManager was used for the network config. This means that the cloud
> controller and all compute nodes had a different VLAN for each OpenStack
> project. We used the -s option of ping to change the packet size. We watched as
> sometimes packets would fully return, sometimes they'd only make it out and
> never back in, and sometimes the packets would stop at a random point. We
> changed tcpdump to start displaying the hex dump of the packet. We pinged
> between every combination of outside, controller, compute, and instance.
> Finally, Alvaro noticed something. When a packet from the outside hits the
> cloud controller, it should not be configured with a VLAN. We verified this as
> true. When the packet went from the cloud controller to the compute node, it
> should only have a VLAN if it was destined for an instance. This was still true.
> When the ping reply was sent from the instance, it should be in a VLAN. True.
> When it came back to the cloud controller and on its way out to the public
> internet, it should no longer have a VLAN. False. Uh oh. It looked as though the
> VLAN part of the packet was not being removed.
> That made no sense.
> While bouncing this idea around in our heads, I was randomly typing commands
> on the compute node:
> $ ip a
> ...
> 10: vlan100 at vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue master br100 state UP ...
> "Hey Alvaro, can you run a VLAN on top of a VLAN?"
> "If you did, you'd add an extra 4 bytes to the packet..."
> Then it all made sense...
> $ grep vlan_interface /etc/nova/nova.conf
> vlan_interface=vlan20
> OpenStack Operations Guide May 15, 2013
> 134
> In nova.conf, vlan_interface specifies what interface OpenStack should attach all
> VLANs to. The correct setting should have been:
> vlan_interface=bond0
> As this would be the server's bonded NIC.
> vlan20 is the VLAN that the data center gave us for outgoing public internet
> access. It's a correct VLAN and is also attached to bond0.
> By mistake, I configured OpenStack to attach all tenant VLANs to vlan20 instead
> of bond0 thereby stacking one VLAN on top of another which then added an
> extra 4 bytes to each packet which cause a packet of 1504 bytes to be sent out
> which would cause problems when it arrived at an interface that only accepted
> 1500!
> As soon as this setting was fixed, everything worked.
> 
> 
> 
> Regards,
> Sudhir.
> 
> -----Original Message-----
> From: rhos-list-bounces at redhat.com [mailto:rhos-list-bounces at redhat.com] On
> Behalf Of Steven Dake
> Sent: Thursday, May 30, 2013 4:55 AM
> To: Minton, Rich
> Cc: rhos-list at redhat.com
> Subject: Re: [rhos-list] EXTERNAL: Re: Red Hat Linux VM freezes.
> 
> On 05/28/2013 10:22 AM, Minton, Rich wrote:
> > It starts to work at an MTU of 1468.
> Rich,
> 
> IP Header is 20 bytes, TCP header is 20 bytes for a total of 40 bytes.
> Not sure where the magic 32 bytes is coming from.  Maybe a vlan tag?  Is it
> possible your switch is configured with a smaller mtu then 1500 or some odd
> VLAN setup?
> 
> Regards
> -steve
> 
> > -----Original Message-----
> > From: Brent Eagles [mailto:beagles at redhat.com]
> > Sent: Tuesday, May 28, 2013 1:14 PM
> > To: Minton, Rich
> > Cc: rhos-list at redhat.com
> > Subject: Re: EXTERNAL: Re: [rhos-list] Red Hat Linux VM freezes.
> >
> > Hi Rick,
> >
> > ----- Original Message -----
> >> This is interesting...
> >>
> >> We were able to resolve (or band aid) our problem by setting the VMs
> >> eth0 MTU to 1000.
> >>
> >> Has anyone else encountered this problem? Any ideas why this is happening?
> >>
> >> Rick
> > I suspected that might be the case when I asked for the ifconfig at the
> beginning. I was having some difficulty reproducing it so I was hesitant to
> recommend altering it. I've heard of SSL/SSH related issues with MTU size and
> was wondering about the effect on payloads that network namespaces might
> have in conjunction with SSH. The hard drive on my test system failed yesterday
> so I'm a little behind unfortunately. I'd like to find out what the failure threshold
> is and what contributes to the delta between 1500 and that threshold.
> >
> > Cheers,
> >
> > Brent

Is any particular encapsulation being used, such as GRE? Long ago, I had similar problems when I was using DSL, and the Ethernet was encapsulated into ATM which only allowed 1500 bytes, so all the packets got fragmented. OVS can support multiple encapsulation types, but encapsulations other than a single VLAN are likely to have problems with MTU. NVGRE, VXLAN, or MPLS would all have problems.

It would be really helpful if Linux and it's network drivers and the physical devices supported 802.3 defined envelope frames*, which allows the frame to be up to 1982 bytes when an encapsulation layer is present, allowing the encapsulation layer to be present without having to change the IP layer MTU above or below 1500. The normal 802 standards only allow a single VLAN tag, with a subsequent extension of MTU to 1504, unless envelope frames are used.

Also, Linux only supports double-tagged VLANs by accident, and only if they both use the same VLAN Ethernet tag type. Support is not ubiquitous, and in fact, with large VLAN IDs and multiple IP addresses, the interface name won't fit into 15 character limit. For example, eth0.1001.1002:0 is 16 characters. The only 802.3 approved way to support double-tagged VLANs is by using envelope frames, which, as stated above, are not supported in Linux.

* 802.3 section 1.4.151 and 3.2.7.

Regards,
John Haller