[Freeipa-users] 389-ds memory usage

Tue Jun 5 22:54:40 UTC 2012

On Jun 5, 2012, at 3:42 PM, Sigbjorn Lie wrote:

> On 06/06/2012 12:26 AM, JR Aquino wrote:
>> On Jun 5, 2012, at 3:12 PM, Sigbjorn Lie wrote:
>> 
>>> On 06/05/2012 11:44 PM, JR Aquino wrote:
>>>> On Jun 5, 2012, at 1:54 PM, Sigbjorn Lie wrote:
>>>> 
>>>>> On 06/05/2012 10:42 PM, Steven Jones wrote:
>>>>>> Hi
>>>>>> 
>>>>>> This has bug has pretty much destroyed my IPA deployment.......I had a pretty bad memory leak had to reboot every 36 hours...made worse by trying later 6.3? rpms didnt fix the leak and it went split brain........2 months and no fix....boy did that open up a can of worms.....
>>>>>> 
>>>>>> :/
>>>>>> 
>>>>>> In my case I cant see how its churn as I have so few entries (<50) and Im adding no more items at present....unless a part of ipa is "replicating and diffing" in the background to check consistency?
>>>>>> 
>>>>>> I also have only one way replication now at most,  master to replica and no memory leak shows in Munin at present.........
>>>>>> 
>>>>>> but I seem to be faced with a rebuild from scratch.......
>>>>> Did you do the "max entry cache size" tuning? If you did, what did you set it to?
>>>>> 
>>>>> Did you do any other tuning from the 389-ds tuning guide?
>>>>> 
>>>>> 
>>>>> 
>>>>> Rgds,
>>>>> Siggi
>>>> When I had similar problems using Feodra (Not Redhat or CentOS) my underlying issues were: managed entries firing off any time an object was updated (every time someone successfully authenticates, kerberos updates the user object, which in turn would touch the mepmanaged entry for the user's private group)  Similar things happened when hostgroups were modified...
>>>> 
>>>> This was further complicated by inefficiencies in the way that slapi-nis was processing the compat pieces for the sudo rules and the netgroups (which are automatically create from every hostgroup)
>>>> 
>>>> Thus, when memberof fired off, slapi-nis recomputed a great deal of its chunk...
>>>> 
>>>> After getting those issues resolved, I tuned the max entry cache size.  But it took all the fixes to finally resolve the memory creep problem.
>>>> 
>>>> It is not at all clear to me whether or not the bug fixes for my problem have made it up into Redhat / CentOS though...  The slapi-nis versions definitely don't line up between fedora and redhat/centos...
>>>> 
>>>> Perhaps Nalin Or Rich can speak to some of that.
>>>> 
>>>> The bug itself was easiest to replicate with _big_ changes like deleting a group that had a great number of members for example, but the symptoms were similar for me were similar for day to date operation resulting in consumption that never freed.
>>>> 
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=771493
>>>> 
>>>> Are either of you currently utilizing sudo?
>>>> 
>>> I read your bug report a while back, and made sure that slapi-nis was disabled.
>>> 
>>> I have tuned my cache size to 256MB. I believe that should be OK as my cache hit ratio sits at 97-99% ?
>>> 
>>> I understand you have a farily large deployment, what cache size are you using? Are you using Fedora or Red Hat / CentOS as your production environment?
>>> 
>>> I do not use sudo with IPA yet, I am planning for doing that later. Is there any issues I should be aware of with sudo integration?
>>> 
>>> Rich/Nalin,
>>> Was there a bug in managed entries that's been fixed in the current 389-ds versions available in Red Hat / CentOS  6?
>>> 
>>> 
>>> Regards,
>>> Siggi
>>> 
>> Ya it is true that I do have a large environment, but some of the hurdles that I had to jump appeared to be ones that weren't related so much to the number of hosts I had, but rather their amount of activity.  I.e. automated single-sign on scripts, people authenticating, general binds taking place all over...
>> 
>> I am using Fedora with FreeIPA 2.2 pending a migration to RHEL 6.3 and IPA 2.2
>> 
>> My measurements... ;)
>> 
>> dn: cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
>> objectClass: top
>> objectClass: extensibleObject
>> cn: monitor
>> database: ldbm database
>> readonly: 0
>> entrycachehits: 904077
>> entrycachetries: 923802
>> entrycachehitratio: 97
>> currententrycachesize: 79607895
>> maxentrycachesize: 104857600
>> currententrycachecount: 10301
>> maxentrycachecount: -1
>> dncachehits: 3
>> dncachetries: 10302
>> dncachehitratio: 0
>> currentdncachesize: 1861653
>> maxdncachesize: 10485760
>> currentdncachecount: 10301
>> maxdncachecount: -1
>> 
>> 
> Ok, we have a fair amount of logons happening too with Nagios running lots of ssh connections to the hosts, as well as normal users. Can't really disable that. :)
> 
> I see your cache size is 100MB, that's less than half of mine. I increased my cache quite a bit as I was advised by Rich about a bug that's not been fixed in RHEL 6.2 version of 389-ds related to when entries in cache is being removed to make room for new cache entries. I was hoping for that issue would go away with a large cache size.
> 

Right, I was advised over the same.  Though it sounds like your not hitting your limit and are still seeing the memory creep...

This makes me question the other factors.  Nagios checking everything (probably every 5 mins?) might be a good source of activity... Though I wonder how best to visualize what is taking up the memory...

Have you turned on auditing at all?  One of the things I was able to deduce from rampant activity was based on what I was seeing modified via the audit log.  Reoccurring patterns coming in big waves... things like that.