[Freeipa-users] 389-ds memory usage

Wed Jun 6 07:30:00 UTC 2012

On Wed, June 6, 2012 00:54, JR Aquino wrote:
> On Jun 5, 2012, at 3:42 PM, Sigbjorn Lie wrote:
>
>
>> On 06/06/2012 12:26 AM, JR Aquino wrote:
>>
>>> On Jun 5, 2012, at 3:12 PM, Sigbjorn Lie wrote:
>>>
>>>
>>>> On 06/05/2012 11:44 PM, JR Aquino wrote:
>>>>
>>>>> On Jun 5, 2012, at 1:54 PM, Sigbjorn Lie wrote:
>>>>>
>>>>>
>>>>>> On 06/05/2012 10:42 PM, Steven Jones wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>
>>>>>>> This has bug has pretty much destroyed my IPA deployment.......I had a pretty bad
>>>>>>> memory leak had to reboot every 36 hours...made worse by trying later 6.3? rpms didnt
>>>>>>> fix the leak and it went split brain........2 months and no fix....boy did that open
>>>>>>> up a can of worms.....
>>>>>>>
>>>>>>> :/
>>>>>>>
>>>>>>>
>>>>>>> In my case I cant see how its churn as I have so few entries (<50) and Im adding no
>>>>>>> more items at present....unless a part of ipa is "replicating and diffing" in the
>>>>>>> background to check consistency?
>>>>>>>
>>>>>>> I also have only one way replication now at most,  master to replica and no memory
>>>>>>> leak shows in Munin at present.........
>>>>>>>
>>>>>>> but I seem to be faced with a rebuild from scratch.......
>>>>>> Did you do the "max entry cache size" tuning? If you did, what did you set it to?
>>>>>>
>>>>>>
>>>>>> Did you do any other tuning from the 389-ds tuning guide?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Rgds,
>>>>>> Siggi
>>>>>>
>>>>> When I had similar problems using Feodra (Not Redhat or CentOS) my underlying issues
>>>>> were: managed entries firing off any time an object was updated (every time someone
>>>>> successfully authenticates, kerberos updates the user object, which in turn would touch
>>>>> the mepmanaged entry for the user's private group)  Similar things happened when
>>>>> hostgroups were modified...
>>>>>
>>>>> This was further complicated by inefficiencies in the way that slapi-nis was processing
>>>>> the compat pieces for the sudo rules and the netgroups (which are automatically create
>>>>> from every hostgroup)
>>>>>
>>>>> Thus, when memberof fired off, slapi-nis recomputed a great deal of its chunk...
>>>>>
>>>>>
>>>>> After getting those issues resolved, I tuned the max entry cache size.  But it took all
>>>>> the fixes to finally resolve the memory creep problem.
>>>>>
>>>>> It is not at all clear to me whether or not the bug fixes for my problem have made it up
>>>>> into Redhat / CentOS though...  The slapi-nis versions definitely don't line up between
>>>>> fedora and redhat/centos...
>>>>>
>>>>> Perhaps Nalin Or Rich can speak to some of that.
>>>>>
>>>>>
>>>>> The bug itself was easiest to replicate with _big_ changes like deleting a group that had
>>>>> a great number of members for example, but the symptoms were similar for me were similar
>>>>> for day to date operation resulting in consumption that never freed.
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=771493
>>>>>
>>>>>
>>>>> Are either of you currently utilizing sudo?
>>>>>
>>>>>
>>>> I read your bug report a while back, and made sure that slapi-nis was disabled.
>>>>
>>>>
>>>> I have tuned my cache size to 256MB. I believe that should be OK as my cache hit ratio sits
>>>> at 97-99% ?
>>>>
>>>> I understand you have a farily large deployment, what cache size are you using? Are you
>>>> using Fedora or Red Hat / CentOS as your production environment?
>>>>
>>>> I do not use sudo with IPA yet, I am planning for doing that later. Is there any issues I
>>>> should be aware of with sudo integration?
>>>>
>>>> Rich/Nalin,
>>>> Was there a bug in managed entries that's been fixed in the current 389-ds versions
>>>> available in Red Hat / CentOS  6?
>>>>
>>>>
>>>> Regards,
>>>> Siggi
>>>>
>>>>
>>> Ya it is true that I do have a large environment, but some of the hurdles that I had to jump
>>> appeared to be ones that weren't related so much to the number of hosts I had, but rather
>>> their amount of activity.  I.e. automated single-sign on scripts, people authenticating,
>>> general binds taking place all over...
>>>
>>> I am using Fedora with FreeIPA 2.2 pending a migration to RHEL 6.3 and IPA 2.2
>>>
>>>
>>> My measurements... ;)
>>>
>>>
>>> dn: cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
>>> objectClass: top
>>> objectClass: extensibleObject
>>> cn: monitor
>>> database: ldbm database
>>> readonly: 0
>>> entrycachehits: 904077
>>> entrycachetries: 923802
>>> entrycachehitratio: 97
>>> currententrycachesize: 79607895
>>> maxentrycachesize: 104857600
>>> currententrycachecount: 10301
>>> maxentrycachecount: -1
>>> dncachehits: 3
>>> dncachetries: 10302
>>> dncachehitratio: 0
>>> currentdncachesize: 1861653
>>> maxdncachesize: 10485760
>>> currentdncachecount: 10301
>>> maxdncachecount: -1
>>>
>>>
>>>
>> Ok, we have a fair amount of logons happening too with Nagios running lots of ssh connections
>> to the hosts, as well as normal users. Can't really disable that. :)
>>
>> I see your cache size is 100MB, that's less than half of mine. I increased my cache quite a bit
>> as I was advised by Rich about a bug that's not been fixed in RHEL 6.2 version of 389-ds
>> related to when entries in cache is being removed to make room for new cache entries. I was
>> hoping for that issue would go away with a large cache size.
>>
>
> Right, I was advised over the same.  Though it sounds like your not hitting your limit and are
> still seeing the memory creep...
>
> This makes me question the other factors.  Nagios checking everything (probably every 5 mins?)
> might be a good source of activity... Though I wonder how best to visualize what is taking up the
> memory...
>
> Have you turned on auditing at all?  One of the things I was able to deduce from rampant activity
> was based on what I was seeing modified via the audit log.  Reoccurring patterns coming in big
> waves... things like that.

I have not turned on any expclicit auditing, but I do use SELinux on the IPA servers, and have the
/var/log/audit/audit.log from all the SELinux activity. Is that what you're referring to?

Yes. most of the Nagios checks is being done every 5 minutes.

I agree, I'm not sure how to proceed in troubleshooting and finding the memory leak.

Rgds,
Siggi