[rhos-list] Problems with quantum and dhcp-agent
Steven Dake
sdake at redhat.com
Wed Jun 5 20:14:13 UTC 2013
On 06/05/2013 12:14 AM, Gary Kotton wrote:
> On 06/05/2013 01:17 AM, Steven Dake wrote:
>> On 06/04/2013 01:09 PM, S Manoo wrote:
>>> Looking into this further, I'm observing the same error message
>>> relating to timeouts talking to qpid in dhcp-agent.log after every
>>> restart, perhaps this is why I'm unable to get any dhcp responses to
>>> instances? Any suggestions on what's causing this and where I might
>>> look to troubleshoot this further?
>
> When one restarts a host each process needs to register with the
> message broker. If you are running all of the services on the same
> host then they will only be able to connect when the qpid service is
> up and running. This usually takes a few seconds after reboot. If a
> service does not receive an answer from the qpid service then it will
> wait and retry again. This is why you see the timeouts. The wait is
> incremental. I have seen that all service are usually able to connect
> within a minute of booting a host (we should try and reduce this time).
>
> Please note that the quantum cli has an option: quantum agent-list.
> This provides the list of agents, their status and hosts that they are
> running on.
>
> If you spin up an instance after the dhcp agent is up and running do
> you see the problem?
>
>>>
>> S Manoo,
>>
>> We may have just fixed a bug related to this problem which is not
>> fixed in the preview. Please try the workaround in this bugzilla:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=970453
>
> This fix is good for an all in one setup but will not help if the DHCP
> agent is running on another host. In Quantum we have the notion of a
> network node. Please look at
> https://docs.google.com/drawings/d/167gegaoTBZpd318b2JTgF_Qi9YdkIX8pcQ6YBJLUtGY/edit?usp=sharing
>
> If the message broker goes down (say for example host reboot or
> network problems) then the dhcp agent will try and reconnect.
>
Gary,
I have found dhcp agent stops responding permanently in this condition
on a all in one setup. Perhaps the same is true for multinode (ie the
retry logic doesn't work as expected). I don't have multiple nodes to
test, but might be worth double-checking if you do.
Regards
-steve
>>
>> Regards
>> -steve
>>
>>
>>> */var/log/quantum/dhcp-agent.log:*
>>> 2013-06-04 12:50:44 INFO [quantum.common.config] Logging enabled!
>>> 2013-06-04 12:50:44 INFO
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on
>>> localhost:5672
>>> 2013-06-04 12:50:44 INFO
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on
>>> localhost:5672
>>> 2013-06-04 12:50:44 INFO [quantum.agent.dhcp_agent] DHCP agent
>>> started
>>> 2013-06-04 12:51:44 ERROR [quantum.agent.dhcp_agent] Failed
>>> reporting state!
>>> Traceback (most recent call last):
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line
>>> 700, in _report_state
>>> self.agent_state)
>>> File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line
>>> 66, in report_state
>>> topic=self.topic)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py",
>>> line 80, in call
>>> return rpc.call(context, self._get_topic(topic), msg, timeout)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py",
>>> line 140, in call
>>> return _get_impl().call(CONF, context, topic, msg, timeout)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 611, in call
>>> rpc_amqp.get_connection_pool(conf, Connection))
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 613, in call
>>> rv = list(rv)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 555, in __iter__
>>> self.done()
>>> File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>> self.gen.next()
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 552, in __iter__
>>> self._iterator.next()
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 436, in iterconsume
>>> yield self.ensure(_error_callback, _consume)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 380, in ensure
>>> error_callback(e)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 421, in _error_callback
>>> raise rpc_common.Timeout()
>>> Timeout: Timeout while waiting on RPC response.
>>> 2013-06-04 12:51:44 WARNING [quantum.openstack.common.loopingcall]
>>> task run outlasted interval by 56.108887 sec
>>> 2013-06-04 12:51:44 INFO [quantum.agent.dhcp_agent]
>>> Synchronizing state
>>>
>>>
>>>
>>>
>>> On Mon, Jun 3, 2013 at 11:28 PM, S Manoo <smanoo76 at gmail.com
>>> <mailto:smanoo76 at gmail.com>> wrote:
>>>
>>>
>>>
>>> *dhcp-agent.log:*
>>> [root at grizzly ~(keystone_admin)]# cat dhcp-agent.log
>>> 2013-06-03 22:27:09 INFO [quantum.common.config] Logging
>>> enabled!
>>> 2013-06-03 22:27:09 INFO
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>> server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>> 2013-06-03 22:27:09 INFO
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>> server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>> 2013-06-03 22:27:10 INFO [quantum.agent.dhcp_agent] DHCP
>>> agent started
>>> 2013-06-03 22:28:10 ERROR [quantum.agent.dhcp_agent] Failed
>>> reporting state!
>>> Traceback (most recent call last):
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py",
>>> line 700, in _report_state
>>> self.agent_state)
>>> File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py",
>>> line 66, in report_state
>>> topic=self.topic)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py",
>>> line 80, in call
>>> return rpc.call(context, self._get_topic(topic), msg, timeout)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py",
>>> line 140, in call
>>> return _get_impl().call(CONF, context, topic, msg, timeout)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 611, in call
>>> rpc_amqp.get_connection_pool(conf, Connection))
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 613, in call
>>> rv = list(rv)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 555, in __iter__
>>> self.done()
>>> File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>> self.gen.next()
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>> line 552, in __iter__
>>> self._iterator.next()
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 436, in iterconsume
>>> yield self.ensure(_error_callback, _consume)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 380, in ensure
>>> error_callback(e)
>>> File
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>> line 421, in _error_callback
>>> raise rpc_common.Timeout()
>>> Timeout: Timeout while waiting on RPC response.
>>> 2013-06-03 22:28:10 WARNING
>>> [quantum.openstack.common.loopingcall] task run outlasted
>>> interval by 56.133099 sec
>>> 2013-06-03 22:28:10 INFO [quantum.agent.dhcp_agent]
>>> Synchronizing state
>>> [root at grizzly ~(keystone_admin)]#
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> rhos-list mailing list
>>> rhos-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/rhos-list
>>
>>
>>
>> _______________________________________________
>> rhos-list mailing list
>> rhos-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rhos-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhos-list/attachments/20130605/6373bb64/attachment.htm>
More information about the rhos-list
mailing list