[Rdo-list] Fedora 20 / Devstack Networking Issues

Tue Jan 28 23:06:35 UTC 2014

On Tue, 2014-01-28 at 10:56 -0500, Ben Nemec wrote:
> 
> ----- Original Message -----
> > On 01/26/2014 12:01 AM, Perry Myers wrote:
> > > Ok, I've been chasing down some networking issues along with some other
> > > folks.  Here's what I'm seeing:
> > > 
> > > Starting with a vanilla F20 cloud image running on a F20 host, clone
> > > devstack into it and run stack.sh.
> > > 
> > > First thing is that the RabbitMQ server issue I noted a few weeks ago is
> > > still intermittently there.  So during the step where rabbitmqctl is run
> > > to set the password of the rabbit admin user, it might fail and all
> > > subsequent AMQP communication fails which makes a lot of the nova
> > > commands in devstack also fail.
> > > 
> > > But... if you get past this error (since it is intermittent), then
> > > devstack seems to complete successfully.  Standard commands like nova
> > > list, keystone user-list, etc all work fine.
> > > 
> > > I did note though that access to Horizon does not work.  I need to
> > > investigate this further.
> > > 
> > > But worse than that is when you run nova boot, the host to guest
> > > networking (remember this is devstack running in a VM) immediately gets
> > > disconnected.  This issue is 100% reproducible and multiple users are
> > > reporting it (tsedovic, eharney, bnemec cc'd)
> > > 
> > > I did some investigation when this happens and here's what I found...
> > > 
> > > If I do:
> > > 
> > > $ brctl delif br100 eth0
> > > 
> > > I was immediately able to ping the guest from the host and vice versa.
> > > 
> > > If I reattach eth0 back to br100, networking stops again
> > > 
> > > Another thing... I notice that on the system br100 does not have an ip
> > > address, but eth0 does.  I thought when doing bridged networking like
> > > this, the bridge should have the ip address and the physical iface that
> > > is attached to the bridge does not get an ip addr.
> > > 
> > > So... I tweaked /etc/sysconfig/network-scripts/ifcfg-eth0 to remove the
> > > dhcp from the bootproto line and I copied ifcfg-eth0 to ifcfg-br100
> > > allowing it to use bootproto dhcp
> > > 
> > > I brought both ifaces down and then brought them both up.  eth0 first
> > > and br100 second
> > > 
> > > This time, br100 got the dhcp address from the host and networking
> > > worked fine.
> > > 
> > > So is this just an issue with how nova is setting up bridges?
> > > 
> > > Since this network disconnect didn't happen until nova launched a vm, I
> > > imagine this isn't a problem with devstack itself, but is likely an
> > > issue with Nova Networking somehow.
> > > 
> > > Russell/DanS, is there any chance that all of the refactoring you did in
> > > Nova Networking very recently introduce a regression?
> > 
> > I suppose it's possible.  You could try going back to before any of the
> > nova-network-objects patches went in.  The first one to merge was:
> > 
> > commit a8c73c7d3298589440579d67e0c5638981dd7718
> > Merge: a1f6e85 aa40c8f
> > Author: Jenkins <jenkins at review.openstack.org>
> > Date:   Wed Jan 15 18:38:37 2014 +0000
> > 
> >     Merge "Make nova-network use Service object"
> > 
> > Try going back to before that and see if you get a different result.  If
> > so, try using "git bisect" to find the offending commit.
> > 
> > --
> > Russell Bryant
> > 
> 
> I am still unable to reproduce the networking issue in my environment.  I booted a stock Fedora 20 cloud image, installed git, cloned devstack and ran it with a minimal localrc configuration (so using the defaults of nova-network and rabbitmq).  Other than the rabbitmq race issue that always makes my first stack.sh run on a new VM fail, I had no problem completing stack.sh and booting a nova instance.  The instance's IP was correctly moved to the bridge for me.  If this is a regression in nova network then it only presents in combination with some other circumstance that isn't present for me.
> 
> As I mentioned in our off-list discussion of this, I run my development VM's in a local OpenStack installation I have, so maybe there's some difference in the way the networking works there.  In any case, while I don't have an answer hopefully another data point will help in figuring this out.

OK.  I can provide a data point that may or may not be useful.

I am getting the same behavior Perry reported.  Install is successful,
first VM launch kills the network.  After much head bashing and
experimentation I found that everything worked correctly if I assigned
the FLAT_INTERFACE in my localrc file to a second, unused NIC on my test
system.  e.g.

FLAT_INTERFACE=p4p2

Prior to that I'd had all of the various localrc interface variables
pointing to the single primary NIC. e.g.

HOST_IP_IFACE=p4p1
PUBLIC_INTERFACE=p4p1
VLAN_INTERFACE=p4p1
FLAT_INTERFACE=p4p1

This style of config (everything on one NIC) is advocated in the single
node getting started guide:

http://devstack.org/guides/single-machine.html

I observed this while testing on commit:

3f5250fff3007dfd1e5992c0cf229be9033a5726

-Ian

> 
> -Ben
> 
> _______________________________________________
> Rdo-list mailing list
> Rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list