[Freeipa-users] 'Request is a replay'

Sun Sep 9 20:16:46 UTC 2012

On 09/08/2012 01:34 AM, Dmitri Pal wrote:
> On 07/26/2012 09:37 AM, Sigbjorn Lie wrote:
>> On 07/26/2012 02:53 PM, Rob Crittenden wrote:
>>> Sigbjorn Lie wrote:
>>>> On Wed, July 25, 2012 09:54, Sigbjorn Lie wrote:
>>>>> On Tue, July 24, 2012 20:29, Simo Sorce wrote:
>>>>>
>>>>>> On Tue, 2012-07-24 at 10:22 +0200, Sigbjorn Lie wrote:
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I keep seing this error message in our production environment
>>>>>>> "Request is a replay" in
>>>>>>> variuos services using kerberos like ssh, sssd, automounter,
>>>>>>> squid +++ after the upgrade to
>>>>>>> RHEL 6.3 /
>>>>>>> IPA
>>>>>>> 2.2.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jul 24 10:16:11 server027 sssd_be: GSSAPI Error: Unspecified GSS
>>>>>>> failure.  Minor code may
>>>>>>> provide more information (Request is a replay)
>>>>>>>
>>>>>>> Seaching google seem to suggest that this is an error with time.
>>>>>>> However we have NTP
>>>>>>> configured (IPA servers as NTP servers) which is synchronized to
>>>>>>> external NTP servers. There
>>>>>>> has been no issue before, and I cannot find issue with the time
>>>>>>> being out of sync on the
>>>>>>> machines where this is happening.
>>>>>> This error usually appears only when a same request is found in the
>>>>>> replay cache. It shouldn't be related to time issues, in that case
>>>>>> you usually get clock-skew.
>>>>>>
>>>>>> Can you tell me what operation was being performed by sssd when you
>>>>>> caught that error ? Can you check if immediately before another
>>>>>> identical operation had been
>>>>>> performed ?
>>>>>>
>>>>> That being said, I do have 1 IPA server (out of 3) that has
>>>>> significantly higher CPU usage than
>>>>> the other 2, the 15-minute load average is sitting at between 0.85
>>>>> and 0.95 the entire day, where
>>>>> ns-slapd 389-ds process is running at 100% most of the time.
>>>>>
>>>>> Load: 1.02, 0.94, 0.87
>>>>>
>>>>>
>>>>> In comparison the other two IPA servers has a 15-minute average
>>>>> between 0.10 - 0.30 throughout
>>>>> the day, and the ns-slapd process is far from being such a cpu hog.
>>>>>
>>>>> On the server having high load, running even a command such as
>>>>> "ipactl status" can take up to 20
>>>>> seconds to complete, where "Directory Service: RUNNING" returns
>>>>> after a second or so, and to list
>>>>> the rest of the services takes the remainding 19 seconds.
>>>>>
>>>>> Also the web interface on this particular IPA server is rendered
>>>>> unusable, returning "Limits
>>>>> exceeded for the query" for almost any action.
>>>>>
>>>>> Restarting all the IPA servies (ipactl restart) on the problematic
>>>>> host soemwhat improves the
>>>>> situation, however that particular server returns to having heavy
>>>>> load quickly.
>>>>>
>>>>> Using logconv.pl to analyze the dirsrv access log file displays
>>>>> that the server in question has
>>>>> the lowest search queries per min with 106 queries/min. The other
>>>>> servers have 710 search
>>>>> queries/sec and 168 queries/sec.
>>>>>
>>>>> For modifications all the IPA servers has about 5-6 queries/sec.
>>>>> For unindexed searches the
>>>>> problematic server is the server with the lowest number. It does
>>>>> however have more than twice the
>>>>> amount of GSSAPI binds than the other servers with over 61000
>>>>> GSSAPI binds over a 17 hour period.
>>>>>
>>>>>
>>>>> The problematic server is a physical server with 2 x AMD 2.4GHz
>>>>> Quad core CPU and 8GB of RAM.
>>>>>
>>>>>
>>>>> This issue is also impacting all the clients, where I see random
>>>>> hangs with anything involving a
>>>>> ldap or kerberos query to the IPA servers.
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>>
>>>> Anyone ?
>>>>
>>>> I am starting to see the Replay error when using the "ipa" CLI tool
>>>> as well, causing the request
>>>> to drop out in an error.
>>>>
>>>> ipa dnsrecord-show example.com hostname
>>>> ipa: ERROR: Local error: SASL(-1): generic failure: GSSAPI Error:
>>>> Unspecified GSS failure.  Minor
>>>> code may provide more information (Request is a replay)
>>> Sorry, I had started a reply yesterday and got side-tracked and never
>>> sent it.
>>>
>> I know that feeling. :)
>>> For the one server is busier than others, how are your clients
>>> configured? Are you using DNS SRV records?
>>>
>> We use DNS SRV records for everything LDAP that does support it ->
>> SSSD and Linux automounter. Solaris clients, Red Hat 5 using nss_ldap,
>> and NetApp use statically configured machines, however this is the
>> second server in the server list for these machines. The primary
>> server got more than 7x more LDAP queries per minute, and the load on
>> the primary is much, much lower. All kerberos clients are using DNS
>> SRV for lookups, no static configuration there.
>>
>> I see some hickups on the clients as well, when browsing nfs shares
>> (looking up UIDs), unlocking a client etc. It would seem like these
>> are related to the "faulty" IPA server with high load, as it seem to
>> respond very slowly to a lot of ldap queries too. I have tried
>> removing it from the DNS SRV records an hour ago, and things seem to
>> run smoother. A few services are still looking up there though, and
>> the load on the "faulty" server is still high even with fewer clients.
>> The primary server that's now receiving most of the queries barely
>> increased anything at all in CPU usage.
>>
>>> For the replay, are your servers running in bare metal or in VMs? How
>>> about the clients? This sure seems like a time issue.
>> The time is configured as it has been for a long time. The physical
>> IPA servers are syncronized from external time sources, providing the
>> rest of the network with time. We have 2 physical servers and 1
>> virtual server. I have looked into the time, and it does seem like
>> everything is syncronized.
>>
>> The amount of clients has not changed much over the last few months.
>>
>> These issues started appearing just after the upgrade to RHEL 6.3 /
>> IPA 2.2.
>>
>> Any suggestions to where to continue the troubleshooting?
>>
>>
> Was this issue ever resolved?
>
I believe this is related to slow response from the krb server when 
binding with GSSAPI as documented in:

https://bugzilla.redhat.com/show_bug.cgi?id=845125

I'm waiting for an updated package to become available for RHEL 6.3. In 
the mean time I have switched the Linux automounters over to a simple 
bind to work around the issue.

Thanks for the follow up. :)

Rgds,
Siggi