[Freeipa-devel] [PATCH] 971 detect binary LDAP data

Wed Feb 29 09:14:57 UTC 2012

On 28.2.2012 18:58, Rob Crittenden wrote:
> Jan Cholasta wrote:
>> On 28.2.2012 18:02, Petr Viktorin wrote:
>>> On 02/28/2012 04:45 PM, Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
>>>>> On 02/28/2012 04:02 AM, Rob Crittenden wrote:
>>>>>> Petr Viktorin wrote:
>>>>>>> On 02/27/2012 05:10 PM, Rob Crittenden wrote:
>>>>>>>> Rob Crittenden wrote:
>>>>>>>>> Simo Sorce wrote:
>>>>>>>>>> On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
>>>>>>>>>>> We are pretty trusting that the data coming out of LDAP matches
>>>>>>>>>>> its
>>>>>>>>>>> schema but it is possible to stuff non-printable characters into
>>>>>>>>>>> most
>>>>>>>>>>> attributes.
>>>>>>>>>>>
>>>>>>>>>>> I've added a sanity checker to keep a value as a python str type
>>>>>>>>>>> (treated as binary internally). This will result in a base64
>>>>>>>>>>> encoded
>>>>>>>>>>> blob be returned to the client.
>>
>> I don't like the idea of having arbitrary binary data where unicode
>> strings are expected. It might cause some unexpected errors (I have a
>> feeling that --addattr and/or --delattr and possibly some plugins might
>> not handle this very well). Wouldn't it be better to just throw away the
>> value if it's invalid and warn the user?
>
> This isn't for user input, it is for data stored in LDAP. User's are
> going to have no way to provide binary data to us unless they use the
> API themselves in which case they have to follow our rules.

Well my point was that --addattr and --delattr cause an LDAP search for 
the given attribute and plugins might get the result of a LDAP search in 
their post_callback and I'm not sure if they can cope with binary data.

>
>>
>>>>>>>>>>
>>>>>>>>>> Shouldn't you try to parse it as a unicode string and catch
>>>>>>>>>> TypeError to
>>>>>>>>>> know when to return it as binary ?
>>>>>>>>>>
>>>>>>>>>> Simo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What we do now is the equivalent of unicode(chr(0)) which returns
>>>>>>>>> u'\x00' and is why we are failing now.
>>>>>>>>>
>>>>>>>>> I believe there is a unicode category module, we might be able to
>>>>>>>>> use
>>>>>>>>> that if there is a category that defines non-printable characters.
>>>>>>>>>
>>>>>>>>> rob
>>>>>>>>
>>>>>>>> Like this:
>>>>>>>>
>>>>>>>> import unicodedata
>>>>>>>>
>>>>>>>> def contains_non_printable(val):
>>>>>>>> for c in val:
>>>>>>>> if unicodedata.category(unicode(c)) == 'Cc':
>>>>>>>> return True
>>>>>>>> return False
>>>>>>>>
>>>>>>>> This wouldn't have the exclusion of tab, CR and LF like using ord()
>>>>>>>> but
>>>>>>>> is probably more correct.
>>>>>>>>
>>>>>>>> rob
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Freeipa-devel mailing list
>>>>>>>> Freeipa-devel at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/freeipa-devel
>>>>>>>
>>>>>>> If you're protecting the XML-RPC calls, it'd probably be better to
>>>>>>> look
>>>>>>> at the XML spec directly: http://www.w3.org/TR/xml/#charsets
>>>>>>>
>>>>>>> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
>>>>>>> [#x10000-#x10FFFF]
>>>>>>>
>>>>>>> I'd say this is a good set for CLI as well.
>>>>>>>
>>>>>>> And you can trap invalid UTF-8 sequences by catching the
>>>>>>> UnicodeDecodeError from decode().
>>>>>>>
>>
>> I don't think we should care about XML-RPC in LDAP-specific code at all.
>> If you want to do some additional checks, do them in XML-RPC-specific
>> code.
>
> We are trusting that the data in LDAP matches its schema. This is just
> belt and suspenders verifying that it is the case.

Sure, but I still think we should allow any valid unicode data to come 
from LDAP, not just what is valid in XML-RPC.

>
> rob

Honza

-- 
Jan Cholasta