[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Pulp-list] QPID Issue Summary



All,

As you all know, we have seen some issues with QPID over the past few months. I'd like to summarize them and inform everyone were we stand. The primary symptom is that goferd starts consuming ~100% CPU. I've also seen cases where httpd starts consuming ~60% CPU as well. During my investigation, I've determined that there a (2) variations of this.

See attached pulp.log for error messages.

SUMMARY:

#1. python-qpid starts spewing error messages in the log which do not refer to a problem with the broker. They instead suggest (to me) a bug in the python driver. Or, a state mismatch between python-qpid and the broker. Currently, I cannot reproduce this on demand. I've heard of this on both i386 and x86_64.

#2. python-qpid starts spewing error messages in the log which refer to a problem with the broker (and/or contained queues). Basically, we're getting a "ConnectionError: Enqueue capacity threshold exceeded on queue" on a temporary (durable) queue associated with a topic subscription. Using qpid-stat, I determined that the queue had only 11 messages queued. Then, using qpid-stat, I purged the queue. But, still having the same problem which seems odd since the queue appears to be empty. I verified that even with 11 messages queued, the qpid consumer was getting an "Empty" exception trying to fetch messages. None of this makes sense. See (attached) pulp.log. The cause of this issue was thought to be (by qpid development team) that the root filesystem was full. However, we're seeing this again on machines with FS not nearly full. A workaround is to uninstall qpid-cpp-server-store. After doing this, the issue seems to be gone. Note, qpid-cpp-server-store is required by pulp so in order to do this, you need to either run from dev-env or hack & respin the rpm without the dep. I've heard of this on only x86_64.


NEXT STEPS:

=== #1 ===
A. I've sent an urgent email to Rafael Schloming who write the python-qpid driver asking him to take a look at the stack traces and try to shed some light on this.

B. Considered that the driver could be getting corrupted by concurrency issues, I updated the gofer stress test over the weekend to hammer gofer/qpid using large numbers of threads. Each thread creates agent proxies and invokes a lot of methods. Inspection of python-qpid code clearly shows that it /should/ be reentrant. All tests ran clean.

C. Replaced the goferd code profiler (cProfile) with YAPPI [1] in attempt to get better profiling information beyond the main thread. cProfile & Profile suck for multi-threaded applications and you have to figure this out the hard way. Any other suggestions appreciated. Today, I'll reproduce condition #2 and at least try to see which thread/function is eating the CPU.


=== #2 ===

A. So far, only know about this happening on Preethi's and Sayli's machines. I want to recreate on another box.
B. Re-engage QPID development team to help resolve.



[1] https://bugzilla.redhat.com/show_bug.cgi?id=688662
[2] https://bugzilla.redhat.com/show_bug.cgi?id=689573
[3] http://code.google.com/p/yappi/
qpid-cpp-server-0.8-4.fc14.i686
python-qpid-0.8-3.fc14.noarch
qpid-tools-0.8-4.fc14.noarch
qpid-cpp-client-0.8-4.fc14.i686
qpid-cpp-server-ssl-0.8-4.fc14.i686
qpid-cpp-client-devel-0.8-4.fc14.i686
qpid-cpp-client-ssl-0.8-4.fc14.i686
ruby-qpid-0.8-1.fc14.i686


2011-03-25 13:39:44,135 [ERROR][heartbeat] run() @ consumer.py:69 - failed:
Message(ttl=9.999, durable=True, content='{\n  "origin": "c467bc1d-6533-4080-8a84-69bbccef7608", \n  "heartbeat": {\n    "uuid": "jortel", \n    "next": 10\n  }, \n  "version": "0.2", \n  "sn": "40bb4978-4ac1-4696-805a-cfacdae459ce"\n}')
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 63, in run
    try:
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 997, in fetch
    self._ecwait(lambda: self.linked)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
    result = self._ewait(lambda: self.closed or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 960, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait
    self.check_error()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error
    raise self.error
InternalError: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 504, in dispatch
    self.engine.dispatch()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 808, in dispatch
    self.process(ssn)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 1036, in process
    self.send(snd, msg)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 1209, in send
    rk = _snd._routing_key
AttributeError: Attachment instance has no attribute '_routing_key'

========================

2011-03-25 15:46:30,135 [DEBUG][heartbeat] run() @ consumer.py:68 - ready
2011-03-25 15:46:30,136 [ERROR][heartbeat] __fetch() @ consumer.py:91 - failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 86, in __fetch
    return self.receiver.fetch(timeout=self.WAIT)
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 997, in fetch
    self._ecwait(lambda: self.linked)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
    result = self._ewait(lambda: self.closed or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 960, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait
    self.check_error()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error
    raise self.error
InternalError: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 504, in dispatch
    self.engine.dispatch()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 807, in dispatch
    self.attach(ssn)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 850, in attach
    self.link(rcv, self._in, rcv.source)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 901, in link
    dir.do_unlink(sst, lnk, _lnk)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 269, in do_unlink
    sst.write_cmds(cmds, action)
AttributeError: 'NoneType' object has no attribute 'write_cmds'

========================
BK

2011-03-21 17:14:04,616 [ERROR][pulp-monitor] update() @ main.py:104 - plugin pulp
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/agent/main.py", line 100, in update
    plugin.attach(uuid)
  File "/usr/lib/python2.7/site-packages/gofer/agent/plugin.py", line 214, in attach
    if not uuid:
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 90, in __init__
    Endpoint.__init__(self, **other)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/endpoint.py", line 53, in __init__
    self.open()
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 115, in open
    receiver = session.receiver(address)
  File "<string>", line 6, in receiver
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 600, in receiver
    if not self.closed and self.connection._connected:
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 957, in _ewait
    return self.error
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait
    return self.error
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait
    def _ewait(self, predicate, timeout=None):
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 187, in check_error
    if self.error:
InternalError: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 616, in write
    op.dispatch(self)
  File "/usr/lib/python2.7/site-packages/qpid/ops.py", line 84, in dispatch
    getattr(target, handler)(self, *args)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 811, in do_session_detached
    sst = self._sessions.pop(dtc.channel)
KeyError: 'pop(): dictionary is empty'

========================
Preethi/sayli

  File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 57,
in send
    sender = self.session().sender(address)
  File "<string>", line 6, in sender
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
576, in sender
    sender._ewait(lambda: sender.linked)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
783, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
550, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
194, in _ewait
    self.check_error()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
187, in check_error
    raise self.error
ConnectionError: Enqueue capacity threshold exceeded on queue
"49ba2548-098b-4784-834b-98323b2d113a:0.0". (JournalImpl.cpp:616)(501)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]