[rdo-list] Problem with ha-router

Cedric Lecomte clecomte at redhat.com
Mon Oct 23 08:23:43 UTC 2017


Hello all,

I tried to deploy RDO Pike without container on our internal plateform.

The setup is pretty simple :
 - 3 Controller in HA
 - 5 Ceph
 - 4 Compute
 - 3 Object-Store

I didn't used any exotic parameter.
This is my deployment command :

openstack overcloud deploy --templates
  -e environement.yaml
  --ntp-server 0.pool.ntp.org
  -e storage-env.yaml
  -e network-env.yaml
  -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph.yaml

  --control-scale 3 --control-flavor control
  --compute-scale 4 --compute-flavor compute
  --ceph-storage-scale 5 --ceph-storage-flavor ceph-storage
  --swift-storage-flavor swift-storage --swift-storage-scale 3
  -e scheduler_hints_env.yaml
  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml

  -e /usr/share/openstack-tripleo-heat-templates/environments/pup
pet-pacemaker.yaml

*environnement.yaml :*
  parameter_defaults:
  ControllerCount: 3
  ComputeCount: 4
  CephStorageCount: 5
  OvercloudCephStorageFlavor: ceph-storage
  CephDefaultPoolSize: 3
  ObjectStorageCount: 3

*network-env.yaml :*
  resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-conf
igs/compute.yaml
  OS::TripleO::Controller::Net::SoftwareConfig:
/home/stack/templates/nic-configs/controller.yaml
  OS::TripleO::CephStorage::Net::SoftwareConfig:
/home/stack/templates/nic-configs/ceph-storage.yaml
  OS::TripleO::ObjectStorage::Net::SoftwareConfig:
/home/stack/templates/nic-configs/swift-storage.yaml

parameter_defaults:
  InternalApiNetCidr: 172.16.0.0/24
  TenantNetCidr: 172.17.0.0/24
  StorageNetCidr: 172.18.0.0/24
  StorageMgmtNetCidr: 172.19.0.0/24
  ManagementNetCidr: 172.20.0.0/24
  ExternalNetCidr: 10.41.11.0/24
  InternalApiAllocationPools: [{'start': '172.16.0.10', 'end':
'172.16.0.200'}]
  TenantAllocationPools: [{'start': '172.17.0.10', 'end': '172.17.0.200'}]
  StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
  StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end':
'172.19.0.200'}]
  ManagementAllocationPools: [{'start': '172.20.0.10', 'end':
'172.20.0.200'}]
  # Leave room for floating IPs in the External allocation pool
  ExternalAllocationPools: [{'start': '10.41.11.10', 'end': '10.41.11.30'}]
  # Set to the router gateway on the external network
  ExternalInterfaceDefaultRoute: 10.41.11.254
  # Gateway router for the provisioning network (or Undercloud IP)
  ControlPlaneDefaultRoute: 192.168.131.253
  # The IP address of the EC2 metadata server. Generally the IP of the
Undercloud
  EC2MetadataIp: 192.0.2.1
  # Define the DNS servers (maximum 2) for the overcloud nodes
  DnsServers: ["10.38.5.26"]
  InternalApiNetworkVlanID: 202
  StorageNetworkVlanID: 203
  StorageMgmtNetworkVlanID: 204
  TenantNetworkVlanID: 205
  ManagementNetworkVlanID: 206
  ExternalNetworkVlanID: 198
  NeutronExternalNetworkBridge: "''"
  ControlPlaneSubnetCidr: '24'
  BondInterfaceOvsOptions:
      "mode=balance-xor"

*storage-env.yaml :*
parameter_defaults:
  ExtraConfig:
    ceph::profile::params::osds:
        '/dev/sdb': {}
        '/dev/sdc': {}
        '/dev/sdd': {}
        '/dev/sde': {}
        '/dev/sdf': {}
        '/dev/sdg': {}
  SwiftRingBuild: false
  RingBuild: false


*scheduler_hints_env.yaml*
parameter_defaults:
    ControllerSchedulerHints:
        'capabilities:node': 'control-%index%'
    NovaComputeSchedulerHints:
        'capabilities:node': 'compute-%index%'
    CephStorageSchedulerHints:
        'capabilities:node': 'ceph-storage-%index%'
    ObjectStorageSchedulerHints:
        'capabilities:node': 'swift-storage-%index%'

After a little use, I found that I found that one controller is unable to
get active ha-router and I got this output :

neutron l3-agent-list-hosting-router XXX
+--------------------------------------+--------------------
----------------+----------------+-------+----------+
| id                                   | host
| admin_state_up | alive | ha_state |
+--------------------------------------+--------------------
----------------+----------------+-------+----------+
| 420a7e31-bae1-4f8c-9438-97839cf190c4 | overcloud-controller-0.localdomain
| True           | :-)   | standby  |
| 6a943aa5-6fd1-4b44-8557-f0043b266a2f | overcloud-controller-1.localdomain
| True           | :-)   | standby  |
| dd66ef16-7533-434f-bf5b-25e38c51375f | overcloud-controller-2.localdomain
| True           | :-)   | standby  |
+--------------------------------------+--------------------
----------------+----------------+-------+----------+

So each time a router is schedule on this controller I can't get an active
router. I tried to compare the configuration but everything seems to be
good. I redeployed to see if it help, and the only thing that change is the
controller where the ha-router are stuck.

The only message that I got is fron OVS :

2017-10-20 08:38:44.930 136145 WARNING neutron.agent.rpc
[req-0ad9aec4-f718-498f-9ca7-15b265340174 - - - - -] Device
Port(admin_state_up=True,allowed_address_pairs=[],
binding=PortBinding,binding_levels=[],created_at=2017-10-
20T08:38:38Z,data_plane_status=<?>,description='',
device_id='a7e23552-9329-4572-a69d-d7f316fcc5c9',device_
owner='network:router_ha_interface',dhcp_options=[],
distributed_binding=None,dns=None,fixed_ips=[IPAllocation],
id=7b6d81ef-0451-4216-9fe5-52d921052cb7,mac_address=fa:16:3e:13:e9:3c,name='HA
port tenant 0ee0af8e94044a42923873939978ed42',network_id=ffe5ffa5-2693-
4d35-988e-7290899601e0,project_id='',qos_policy_id=None,revision_number=5,
security=PortSecurity(7b6d81ef-0451-4216-9fe5-52d921052cb7),security_group_
ids=set([]),status='DOWN',updated_at=2017-10-20T08:38:44Z) is not bound.
2017-10-20 08:38:44.944 136145 WARNING neutron.plugins.ml2.drivers.
openvswitch.agent.ovs_neutron_agent [req-0ad9aec4-f718-498f-9ca7-15b265340174
- - - - -] Device 7b6d81ef-0451-4216-9fe5-52d921052cb7 not defined on
plugin or binding failed

Any Idea ?

-- 

LECOMTE Cedric

Senior software ENgineer

Red Hat

<https://www.redhat.com>

clecomte at redhat.com
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rdo-list/attachments/20171023/e8ef5df9/attachment.htm>


More information about the rdo-list mailing list