[rhos-list] Problems with quantum and dhcp-agent

Steven Dake sdake at redhat.com
Wed Jun 5 20:14:13 UTC 2013


On 06/05/2013 12:14 AM, Gary Kotton wrote:
> On 06/05/2013 01:17 AM, Steven Dake wrote:
>> On 06/04/2013 01:09 PM, S Manoo wrote:
>>> Looking into this further, I'm observing the same error message 
>>> relating to timeouts talking to qpid in dhcp-agent.log after every 
>>> restart, perhaps this is why I'm unable to get any dhcp responses to 
>>> instances? Any suggestions on what's causing this and where I might 
>>> look to troubleshoot this further?
>
> When one restarts a host each process needs to register with the 
> message broker. If you are running all of the services on the same 
> host then they will only be able to connect when the qpid service is 
> up and running. This usually takes a few seconds after reboot. If a 
> service does not receive an answer from the qpid service then it will 
> wait and retry again. This is why you see the timeouts. The wait is 
> incremental. I have seen that all service are usually able to connect 
> within a minute of booting a host (we should try and reduce this time).
>
> Please note that the quantum cli has an option: quantum agent-list. 
> This provides the list of agents, their status and hosts that they are 
> running on.
>
> If you spin up an instance after the dhcp agent is up and running do 
> you see the problem?
>
>>>
>> S Manoo,
>>
>> We may have just fixed a bug related to this problem which is not 
>> fixed in the preview.  Please try the workaround in this bugzilla:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=970453
>
> This fix is good for an all in one setup but will not help if the DHCP 
> agent is running on another host. In Quantum we have the notion of a 
> network node. Please look at 
> https://docs.google.com/drawings/d/167gegaoTBZpd318b2JTgF_Qi9YdkIX8pcQ6YBJLUtGY/edit?usp=sharing
>
> If the message broker goes down (say for example host reboot or 
> network problems) then the dhcp agent will try and reconnect.
>
Gary,

I have found dhcp agent stops responding permanently in this condition 
on a all in one setup.  Perhaps the same is true for multinode (ie the 
retry logic doesn't work as expected).  I don't have multiple nodes to 
test, but might be worth double-checking if you do.

Regards
-steve

>>
>> Regards
>> -steve
>>
>>
>>> */var/log/quantum/dhcp-agent.log:*
>>> 2013-06-04 12:50:44     INFO [quantum.common.config] Logging enabled!
>>> 2013-06-04 12:50:44     INFO 
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 
>>> localhost:5672
>>> 2013-06-04 12:50:44     INFO 
>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 
>>> localhost:5672
>>> 2013-06-04 12:50:44     INFO [quantum.agent.dhcp_agent] DHCP agent 
>>> started
>>> 2013-06-04 12:51:44    ERROR [quantum.agent.dhcp_agent] Failed 
>>> reporting state!
>>> Traceback (most recent call last):
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line 
>>> 700, in _report_state
>>>     self.agent_state)
>>>   File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line 
>>> 66, in report_state
>>>     topic=self.topic)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py", 
>>> line 80, in call
>>>     return rpc.call(context, self._get_topic(topic), msg, timeout)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py", 
>>> line 140, in call
>>>     return _get_impl().call(CONF, context, topic, msg, timeout)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", 
>>> line 611, in call
>>>     rpc_amqp.get_connection_pool(conf, Connection))
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", 
>>> line 613, in call
>>>     rv = list(rv)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", 
>>> line 555, in __iter__
>>>     self.done()
>>>   File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>>     self.gen.next()
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", 
>>> line 552, in __iter__
>>>     self._iterator.next()
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", 
>>> line 436, in iterconsume
>>>     yield self.ensure(_error_callback, _consume)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", 
>>> line 380, in ensure
>>>     error_callback(e)
>>>   File 
>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", 
>>> line 421, in _error_callback
>>>     raise rpc_common.Timeout()
>>> Timeout: Timeout while waiting on RPC response.
>>> 2013-06-04 12:51:44  WARNING [quantum.openstack.common.loopingcall] 
>>> task run outlasted interval by 56.108887 sec
>>> 2013-06-04 12:51:44     INFO [quantum.agent.dhcp_agent] 
>>> Synchronizing state
>>>
>>>
>>>
>>>
>>> On Mon, Jun 3, 2013 at 11:28 PM, S Manoo <smanoo76 at gmail.com 
>>> <mailto:smanoo76 at gmail.com>> wrote:
>>>
>>>
>>>
>>>     *dhcp-agent.log:*
>>>     [root at grizzly ~(keystone_admin)]# cat dhcp-agent.log
>>>     2013-06-03 22:27:09     INFO [quantum.common.config] Logging
>>>     enabled!
>>>     2013-06-03 22:27:09     INFO
>>>     [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>>     server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>>     2013-06-03 22:27:09     INFO
>>>     [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>>     server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>>     2013-06-03 22:27:10     INFO [quantum.agent.dhcp_agent] DHCP
>>>     agent started
>>>     2013-06-03 22:28:10    ERROR [quantum.agent.dhcp_agent] Failed
>>>     reporting state!
>>>     Traceback (most recent call last):
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py",
>>>     line 700, in _report_state
>>>         self.agent_state)
>>>       File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py",
>>>     line 66, in report_state
>>>         topic=self.topic)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py",
>>>     line 80, in call
>>>         return rpc.call(context, self._get_topic(topic), msg, timeout)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py",
>>>     line 140, in call
>>>         return _get_impl().call(CONF, context, topic, msg, timeout)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>     line 611, in call
>>>         rpc_amqp.get_connection_pool(conf, Connection))
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>     line 613, in call
>>>         rv = list(rv)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>     line 555, in __iter__
>>>         self.done()
>>>       File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>>         self.gen.next()
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>     line 552, in __iter__
>>>         self._iterator.next()
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>     line 436, in iterconsume
>>>         yield self.ensure(_error_callback, _consume)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>     line 380, in ensure
>>>         error_callback(e)
>>>       File
>>>     "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>     line 421, in _error_callback
>>>         raise rpc_common.Timeout()
>>>     Timeout: Timeout while waiting on RPC response.
>>>     2013-06-03 22:28:10  WARNING
>>>     [quantum.openstack.common.loopingcall] task run outlasted
>>>     interval by 56.133099 sec
>>>     2013-06-03 22:28:10     INFO [quantum.agent.dhcp_agent]
>>>     Synchronizing state
>>>     [root at grizzly ~(keystone_admin)]#
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> rhos-list mailing list
>>> rhos-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/rhos-list
>>
>>
>>
>> _______________________________________________
>> rhos-list mailing list
>> rhos-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rhos-list
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhos-list/attachments/20130605/6373bb64/attachment.htm>


More information about the rhos-list mailing list