[Freeipa-devel] [PATCH] 940 apply some validation to some classes only

Mon Feb 20 20:31:22 UTC 2012

Jan Cholasta wrote:
> On 15.2.2012 15:33, Rob Crittenden wrote:
>> Jan Cholasta wrote:
>>> On 14.2.2012 22:16, Rob Crittenden wrote:
>>>> Jan Cholasta wrote:
>>>>> On 14.2.2012 16:44, Jan Cholasta wrote:
>>>>>> On 7.2.2012 20:25, Rob Crittenden wrote:
>>>>>>> Rob Crittenden wrote:
>>>>>>>> Jan Cholasta wrote:
>>>>>>>>> Dne 7.2.2012 09:27, Martin Kosek napsal(a):
>>>>>>>>>> On Mon, 2012-02-06 at 11:52 -0500, Rob Crittenden wrote:
>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>> On Fri, 2012-02-03 at 16:58 -0500, Rob Crittenden wrote:
>>>>>>>>>>>>> There is some validation that we only need to apply when an
>>>>>>>>>>>>> entry is
>>>>>>>>>>>>> being created, namely the key itself. This is to allow us to
>>>>>>>>>>>>> manage an
>>>>>>>>>>>>> otherwise illegal entry that finds its way into the system
>>>>>>>>>>>>> (i.e.
>>>>>>>>>>>>> migration).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider this. We migrate a group with a space in it. This
>>>>>>>>>>>>> isn't
>>>>>>>>>>>>> allowed
>>>>>>>>>>>>> in IPA but we also provide no way to delete it because the cn
>>>>>>>>>>>>> regex
>>>>>>>>>>>>> kicks out the group-del command.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The trick is adding appropriate context so we can know during
>>>>>>>>>>>>> validation
>>>>>>>>>>>>> how we got here. A command object has a bases field which
>>>>>>>>>>>>> contains
>>>>>>>>>>>>> the
>>>>>>>>>>>>> base classes associated with it, which appears to contain only
>>>>>>>>>>>>> the
>>>>>>>>>>>>> leaf
>>>>>>>>>>>>> baseclass. So using this we can tell how we got to validation
>>>>>>>>>>>>> and
>>>>>>>>>>>>> can
>>>>>>>>>>>>> skip based on that baseclass name.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I went back and forth a bit on where the right place to put
>>>>>>>>>>>>> this
>>>>>>>>>>>>> was,
>>>>>>>>>>>>> I'm open to more fine tuning. I initially skipped just the
>>>>>>>>>>>>> pattern
>>>>>>>>>>>>> validation then expanded it to apply to all validation in the
>>>>>>>>>>>>> Parameter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> rob
>>>>>>>>>>>>
>>>>>>>>>>>> 1) This patch breaks API.txt, it needs re-generating.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) This may be a matter of opinion but I think that
>>>>>>>>>>>> + if self.onlyvalidateclasses and \
>>>>>>>>>>>> + not any(x in self.onlyvalidateclasses for x in classbases):
>>>>>>>>>>>> + return
>>>>>>>>>>>>
>>>>>>>>>>>> would be more readable than:
>>>>>>>>>>>>
>>>>>>>>>>>> + if self.onlyvalidateclasses and \
>>>>>>>>>>>> + not [x for x in classbases if x in self.onlyvalidateclasses]:
>>>>>>>>>>>> + return
>>>>>>>>>>>>
>>>>>>>>>>>> 3) I would move the entire self.onlyvalidateclasses up to the
>>>>>>>>>>>> validate()
>>>>>>>>>>>> method. It would have several benefits:
>>>>>>>>>>>> - the validation skip would be done just once, not for every
>>>>>>>>>>>> value as
>>>>>>>>>>>> the decision does not use the value at all
>>>>>>>>>>>> - we would not have to modify _validate_scalar() methods at all
>>>>>>>>>>>> since
>>>>>>>>>>>> they won't need classbases
>>>>>>>>>>>>
>>>>>>>>>>>> 4) I think it would be better to keep validation for --rename
>>>>>>>>>>>> options.
>>>>>>>>>>>> As it is generated from RDN attribute parameter, it inherits
>>>>>>>>>>>> onlyvalidateclasses list as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Otherwise your patch would let user to rename a valid object
>>>>>>>>>>>> to an
>>>>>>>>>>>> invalid one:
>>>>>>>>>>>>
>>>>>>>>>>>> # ipa user-mod fbar --rename="bad boy"
>>>>>>>>>>>> --------------------
>>>>>>>>>>>> Modified user "fbar"
>>>>>>>>>>>> --------------------
>>>>>>>>>>>> User login: bad boy
>>>>>>>>>>>> First name: Foo
>>>>>>>>>>>> Last name: Bar
>>>>>>>>>>>> Home directory: /home/fbar
>>>>>>>>>>>> Login shell: /bin/sh
>>>>>>>>>>>> UID: 480800040
>>>>>>>>>>>> GID: 480800040
>>>>>>>>>>>> Account disabled: False
>>>>>>>>>>>> Password: False
>>>>>>>>>>>> Member of groups: ipausers
>>>>>>>>>>>> Kerberos keys available: False
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This should address your concerns.
>>>>>>>>>>>
>>>>>>>>>>> rob
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Its almost OK, there is just one part that's not that OK:
>>>>>>>>>>
>>>>>>>>>> @@ -831,6 +836,9 @@ class Param(ReadOnly):
>>>>>>>>>> else:
>>>>>>>>>> raise RequirementError(name=self.name)
>>>>>>>>>> return
>>>>>>>>>> + if 'rename' not in (self.name, self.cli_name):
>>>>>>>>>> + if self.onlyvalidateclasses and not [x for x in
>>>>>>>>>> self.onlyvalidateclasses if x in classbases]: #pylint:
>>>>>>>>>> disable=E1101
>>>>>>>>>> + return
>>>>>>>>>> if self.multivalue:
>>>>>>>>>> if type(value) is not tuple:
>>>>>>>>>> raise TypeError(
>>>>>>>>>>
>>>>>>>>>> I don't think that hard-coding this skip for onlyvalidateclasses
>>>>>>>>>> based
>>>>>>>>>> just on parameter name is a good thing to do and may cause
>>>>>>>>>> problems in
>>>>>>>>>> the future. For example if we create some option called "rename"
>>>>>>>>>> and
>>>>>>>>>> deliberately set onlyvalidateclasses for the option - it would
>>>>>>>>>> then be
>>>>>>>>>> skipped as well.
>>>>>>>>>>
>>>>>>>>>> I think it would be a better solution to just update
>>>>>>>>>> _get_rename_option() in baseldap.py to set onlyvalidateclasses
>>>>>>>>>> to ()
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I must say I'm not a big fan of this approach. You are healing the
>>>>>>>>> symptoms (custom validation on some parameters makes it
>>>>>>>>> impossible to
>>>>>>>>> manipulate existing LDAP entries with invalid attribute values =>
>>>>>>>>> add a
>>>>>>>>> way to mark parameters to be validated only in certain commands to
>>>>>>>>> prevent that), but the real issue here is that we should not
>>>>>>>>> perform
>>>>>>>>> custom validation when referring to existing LDAP attribute
>>>>>>>>> values at
>>>>>>>>> all (or only partially), no matter what parameter and command.
>>>>>>>>> Fixing
>>>>>>>>> this would make the problem go away for all commands, present or
>>>>>>>>> future,
>>>>>>>>> without the need for adding a list of command classes to each
>>>>>>>>> parameter
>>>>>>>>> that is affected.
>>>>>>>>>
>>>>>>>>> Honza
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is all about context, and parameters have very little of it. The
>>>>>>>> bottom line is we only want to do validation on new values.
>>>>>>>>
>>>>>>>> I think I can bump this up a level by adding a validate boolean to
>>>>>>>> Param
>>>>>>>> and Command both defaulting to False. crud.Create will override
>>>>>>>> this to
>>>>>>>> True and cloning rename could perhaps also set this flag.
>>>>>>>>
>>>>>>>> Then in validation if the parent object has validation set or the
>>>>>>>> parameter does we validate, otherwise we skip it.
>>>>>>>>
>>>>>>>> This will require some other changes for Enums, they may always
>>>>>>>> require
>>>>>>>> validation (I'll need to be sure --delattr can remove a bad one).
>>>>>>>>
>>>>>>>> rob
>>>>>>>
>>>>>>> Updated patch. The primary key will be validated only on adds. The
>>>>>>> values will be validated on adds and mods.
>>>>>>
>>>>>> It turns out there already is infrastructure which does exactly the
>>>>>> same
>>>>>> thing: the "query" kwarg of the Parameter class. We should fix its
>>>>>> behavior rather than reinvent the wheel.
>>>>>
>>>>> This might be good enough:
>>>>>
>>>>> diff --git a/ipalib/crud.py b/ipalib/crud.py
>>>>> index 833914c..91b91cf 100644
>>>>> --- a/ipalib/crud.py
>>>>> +++ b/ipalib/crud.py
>>>>> @@ -144,7 +144,7 @@ class Create(Method):
>>>>> continue
>>>>> if 'ask_create' in option.flags:
>>>>> yield option.clone(
>>>>> - attribute=attribute, query=True, required=False,
>>>>> + attribute=attribute, required=False,
>>>>> autofill=False, alwaysask=True
>>>>> )
>>>>> else:
>>>>> @@ -189,7 +189,7 @@ class Update(PKQuery):
>>>>> continue
>>>>> if 'ask_update' in option.flags:
>>>>> yield option.clone(
>>>>> - attribute=attribute, query=True, required=False,
>>>>> + attribute=attribute, required=False,
>>>>> autofill=False, alwaysask=True
>>>>> )
>>>>> elif 'req_update' in option.flags:
>>>>> diff --git a/ipalib/parameters.py b/ipalib/parameters.py
>>>>> index d918a57..46b0ffb 100644
>>>>> --- a/ipalib/parameters.py
>>>>> +++ b/ipalib/parameters.py
>>>>> @@ -506,7 +506,7 @@ class Param(ReadOnly):
>>>>> self.class_rules = tuple(class_rules)
>>>>> self.rules = rules
>>>>> if self.query:
>>>>> - self.all_rules = self.class_rules
>>>>> + self.all_rules = ()
>>>>> else:
>>>>> self.all_rules = self.class_rules + self.rules
>>>>> for rule in self.all_rules:
>>>>
>>>>
>>>> This won't work.
>>>
>>> Why not?
>>>
>>>> query should only apply when searching. The idea is we
>>>> don't care what value you want to search for, even if it is illegal
>>>> (within reason). Only the default validators are run when query is True
>>>> IIRC.
>>>
>>> Well, query doc says:
>>>
>>> "query: this attribute is controlled by framework. When the `query` is
>>> enabled, framework assumes that the value is only queried and not
>>> inserted in the LDAP. Validation is then relaxed - custom parameter
>>> validators are skipped and only basic class validators are executed to
>>> check the parameter value"
>>>
>>> You can see that there is no mention of that it should be used only in
>>> searches and it is indeed not used only in searches - in ipalib.crud,
>>> query is set for all parameters except the primary key in Create and
>>> options in Create and Update, which means that the primary key is
>>> validated only in adds and the options are validated only in adds and
>>> mods, which is exactly the same thing you do in your patch.
>>>
>>> The way I see it, the issue is that the validation should be relaxed
>>> even more (which is what the "self.all_rules = ()" line does) or we
>>> should add means of specifying which class validation rules should be
>>> used when querying and which should not (Data._rule_pattern is one of
>>> the class rules that obviously should not be used when querying).
>>
>> We only want to relax validation on the primary key. The idea is to not
>> prevent someone from addressing an otherwise illegal entry.
>
> Again, we already *do* relax validation on the primary key (cloning the
> primary key parameter with query=True in ipalib.crud.PKQuery.get_args),
> but we do not relax it enough (using all class validation rules when
> query=True in ipalib.parameters.Parameter.__init__, which causes the
> pattern to be checked in ipalib.parameters.Parameter._validate_scalar,
> which we don't want to do).
>

Ok, I think you have me convinced.

>>
>>>>
>>>>>> Also, the "pattern" and "pattern_errmsg" kwargs should not be copied
>>>>>> from the primary key parameter, but from the RDN parameter
>>>>>> (imagine an
>>>>>> object which has cn as its primary key and ipaUniqueId as RDN, you
>>>>>> would
>>>>>> use the pattern from cn on ipaUniqueId, which is obviously not
>>>>>> right).
>>>>>> It is supposed be done automatically as part of the clone_rename()
>>>>>> call,
>>>>>> so if it's not happening, there's probably a bug in clone_rename.
>>>>
>>>> There currently isn't such an object. A special rename option would be
>>>> needed since as you point out the key isn't the RDN. If there were we'd
>>>> want the pattern from the key, not the RDN, as that is the one that
>>>> matters.
>>>
>>> There is netgroup, which does use cn as its primary key and ipaUniqueId
>>> as its RDN, though it doesn't have a pattern for cn. But there easily
>>> could be an object like netgroup with a pattern for cn and in that case
>>> the rename option would be broken and would have to be fixed anyway.
>>>
>>> The bottom line is that there is no guarantee that the primary key and
>>> RDN are the same attribute, so we can't just use the pattern from one on
>>> the other.
>>
>> It is a different problem. A rename is really an RDN rename. To rename a
>> netgroup all you have to do is modify the cn value. Something that
>> should be handled more gracefully than it is now (not at all).
>
> I'm not concerned about renaming netgroups at all ATM. I'm aware that
> rename triggers modrdn operation. What I don't get is this:
>
> has_output_params = global_output_params
>
> def _get_rename_option(self):
> + pattern = self.obj.primary_key.pattern
> + errmsg = self.obj.primary_key.pattern_errmsg
> +
> rdnparam = getattr(self.obj.params, self.obj.rdnattr)
> return rdnparam.clone_rename('rename',
> cli_name='rename', required=False, label=_('Rename'),
> + pattern = pattern, pattern_errmsg = errmsg,
> doc=_('Rename the %(ldap_obj_name)s object') % dict(
> ldap_obj_name=self.obj.object_name
> )
>
> You assume that using pattern from the primary key parameter on the RDN
> parameter is OK. Why?

The variable name rdnattr can be misleading. It is only used to give the 
name of hte RDN in something that can be renamed. Compare this to 
something like netgroups where the DN has no visible relationship to the 
content of the object (ipaUniqueId). Only those objects that define 
rdnattr get a rename option (look at netgroups, for example).

rdnattr is always the primary key but not always defined. It should 
probably be a boolean instead, rdn_is_primary_key or something a bit 
more obvious. I can make that change here.

rob