[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Freeipa-users] Large slow down when using IPA



On 01/02/2012 01:47 PM, Simo Sorce wrote:
> On Mon, 2012-01-02 at 11:54 -0900, Erinn Looney-Triggs wrote:
>> On 01/02/2012 11:40 AM, Jakub Hrozek wrote:
>>> On Mon, Jan 02, 2012 at 12:53:29PM -0500, Simo Sorce wrote:
>>>> On Mon, 2012-01-02 at 17:29 +0100, Jakub Hrozek wrote:
>>>>> On Mon, Jan 02, 2012 at 10:00:02AM -0500, Simo Sorce wrote:
>>>>>> On Sat, 2011-12-31 at 01:35 -0900, Erinn Looney-Triggs wrote:
>>>>>>> On 12/30/2011 07:19 PM, JR Aquino wrote:
>>>>>>>>
>>>>>>>> On Dec 30, 2011, at 5:45 PM, Erinn Looney-Triggs wrote:
>>>>>>>>
>>>>>>>>> I have been slowly rolling out FreeIPA to my systems, trying to track
>>>>>>>>> differences/changes. One of the most noticeable has been a large slow
>>>>>>>>> down in file access times.
>>>>>>>>>
>>>>>>>>> Let me explain as best as I can. I use AIDE to track the file system
>>>>>>>>> (think tripwire) and it runs checks once a day. During these checks it
>>>>>>>>> is scanning (almost) the entire file system and comparing it to a stored
>>>>>>>>> database. On a moderately powered system with ~151k files, an AIDE run
>>>>>>>>> will usually take ~30 minutes. After the system becomes an IPA client
>>>>>>>>> the same run will generally take ~90-120 minutes. Un-install the
>>>>>>>>> ipa-client, back to ~30 minutes for an AIDE run.
>>>>>>>>>
>>>>>>>>> Now clearly a lot of lookups are being done for user names and group
>>>>>>>>> names, and this will have a performance hit that is dependant on the
>>>>>>>>> network. However, the odd thing is that even when running on the IPA
>>>>>>>>> server itself the slowdown is still the same.
>>>>>>>>>
>>>>>>>>> Not sure if this is an IPA problem, an SSSD problem, a bit of both, or
>>>>>>>>> neither, perhaps it is just the way it is, but a slowdown of 3-4x seems
>>>>>>>>> a bit much to me. Clearly the results are not scientific, however, they
>>>>>>>>> have been generally reproducible since I started rolling IPA out.
>>>>>>>>>
>>>>>>>>> As a side note this slowdown has also broken bacula backups, as the
>>>>>>>>> bacula client is scanning the filesystem for change (using accurate
>>>>>>>>> backups) the director times out.
>>>>>>>>>
>>>>>>>>> Any thoughts, or opinions? Workarounds etc? I have checked to make sure
>>>>>>>>> that SSSD caching is enabled, and functional.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -Erinn
>>>>>>>>
>>>>>>>> I am assuming that these are all running as local users.
>>>>>>>>
>>>>>>>> From the sssd.conf man page in the nss section:
>>>>>>>>
>>>>>>>> filter_users, filter_groups (string)
>>>>>>>>            Exclude certain users from being fetched from the sss NSS database. This is particularly useful for system accounts. This option can also be set per-domain or include fully-qualified names to filter only users from the
>>>>>>>>            particular domain.
>>>>>>>>
>>>>>>>>            Default: root
>>>>>>>>
>>>>>>>>
>>>>>>>> Try adding this to your sssd.conf:
>>>>>>>>
>>>>>>>> [nss]
>>>>>>>>            filter_groups = root,bacula,aide,otherdaemonuser <-as needed
>>>>>>>>            filter_users = root,bacula,aide,otherdaemonuser <- as needed
>>>>>>>>
>>>>>>>> Let me know if that solves your issue.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for pointing that out, completely missed that option! Wouldn't it
>>>>>>> be sweet to have an option that say looked at /etc/login.defs and just
>>>>>>> didn't lookup anything under MIN_UID, on the assumption that those are
>>>>>>> system accounts? Certainly would stop a lot of lookups I imagine.
>>>>>>
>>>>>> We already have range options (min_id/max_id), but unfortunately that
>>>>>> doesn't help when an application asks for information by name.
>>>>>> You either permanently blacklist such name or you have to do the lookup
>>>>>> and then find out it either a) does not exist, or b) it has to be
>>>>>> filtered out.
>>>>>>
>>>>>>> Of course you would have to leave it as an option and probably default
>>>>>>> it to off given the odd things people do with their systems.
>>>>>>
>>>>>> Indeed sssd used to enforce a min id range of 1000 and we turned it off
>>>>>> in the default configuration due to issues with weird configurations.
>>>>>>
>>>>>> Can you try using both min_id and filter_users and see if it makes any
>>>>>> difference in your case ?
>>>>>>
>>>>>> Simo.
>>>>>>
>>>>>
>>>>> Even when performing getpwuid() calls, SSSD first looks up the user
>>>>> information, reads the UID LDAP attribute and then checks the UID value
>>>>> from LDAP against min_id/max_id values.
>>>>
>>>> Not according to my reading of the sources, if you look into
>>>> nss_cmd_getpwuid_search() you'll see that we proceed only if we first
>>>> pass the min_id/max_id range check, otherwise we return ENOENT.
>>>>
>>>> Simo.
>>>
>>> Sorry, you're right and I need to warm up my brain a little more after
>>> the Christmas break.
>>>
>>> Thanks!
>>
>> I am going through some testing now to try and get you folks something
>> more definitive. However, from an early test adding users/groups to
>> filter_* seemed to reduce the performance hit slightly, but it did not
>> take it anywhere near the levels it was at before sssd was in place.
>>
>> Like I said I will continue to test and get you folks some more
>> definitive results, probably later today. Thanks for all the info and
>> feedback.
> 
> Hi Erinn,
> can you please tell what's the baseline you are comparing against ?
> 
> Is it nss_ldap ? With or without nscd ?
> 
> Simo.
> 


Here is what I am comparing, and hoping not to make myself look like an
idiot in the process :).

ipa-client system without excludes configured in sssd.
ipa-client system with excludes for users and groups < id 500 configured
in sssd.
Finally a non ipa-client system, as in local only.

These tests are all being done on the same system. All I have at this
point is the fact that I started rolling out IPA (and thus sssd) to
clients and AIDE checks bombed in terms of the amount of time they took.
That and accurate bacula backups failed because it was taking too long
to compare the local files to the db. Now it could be unrelated, but I
hope not (in the not looking like an idiot category).

Now of course network traffic and lag times are involved here, but the
one key point that made me take notice here is that the times for an
AIDE check also bombed (at about the same ratio) on the IPA servers
themselves, thus pointing me to the idea that perhaps this wasn't all
network related.

Feel free to let me know if there is something smarter that can be done
here, or if you believe I am just barking up the wrong tree. It would be
interesting to test against nns_ldap etc. to see if things are about the
same. I can also test clients on the same network segment as the ipa
server (right now they are showing similar slowdowns as more physically
remote systems).

On the other hand the whole sssd link may be moot with the weird lastlog
thing I just found (see other message), if AIDE and bacula are trying to
process a 438GB file that could certainly slow them down. Nevertheless,
this is what I am doing.

-Erinn



Attachment: signature.asc
Description: OpenPGP digital signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]