[Freeipa-devel] [PATCH 0070] Normalization check only for IDNA domains

Mon Jun 30 11:49:14 UTC 2014

On Mon, 2014-06-30 at 11:43 +0300, Alexander Bokovoy wrote:
> On Mon, 30 Jun 2014, Martin Basti wrote:
> >On Fri, 2014-06-27 at 14:03 +0300, Alexander Bokovoy wrote:
> >> On Fri, 27 Jun 2014, Martin Kosek wrote:
> >> >On 06/27/2014 12:10 PM, Alexander Bokovoy wrote:
> >> >> On Fri, 27 Jun 2014, Petr Spacek wrote:
> >> >>> On 27.6.2014 11:21, Jan Cholasta wrote:
> >> >>>> On 27.6.2014 10:58, Alexander Bokovoy wrote:
> >> >>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
> >> >>>>>> On 27.6.2014 10:29, Alexander Bokovoy wrote:
> >> >>>>>>> On Fri, 27 Jun 2014, Jan Cholasta wrote:
> >> >>>>>>>> On 27.6.2014 10:15, Alexander Bokovoy wrote:
> >> >>>>>>>>> On Fri, 20 Jun 2014, Martin Basti wrote:
> >> >>>>>>>>>> On Fri, 2014-06-20 at 10:32 +0200, Jan Cholasta wrote:
> >> >>>>>>>>>>> On 18.6.2014 16:49, Martin Basti wrote:
> >> >>>>>>>>>>>> Due to compability with older versions, only IDNA domains should be
> >> >>>>>>>>>>>> checked
> >> >>>>>>>>>>>> Patch attached.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I'm not particularly happy about the u'\xdf' special case. Isn't
> >> >>>>>>>>>>> there a
> >> >>>>>>>>>>> better way to do this check?
> >> >>>>>>>>>> I cant find better way. u'\xdf' is mapped to ss, and ss is not IDN
> >> >>>>>>>>>> string.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Or just remove this validation.
> >> >>>>>>>>>>
> >> >>>>>>>>>>> (BTW I really think this should be a warning, not an error, but that
> >> >>>>>>>>>>> would require larger amount of work, so I guess it's OK for now.)
> >> >>>>>>>>>> (More pain than gain)
> >> >>>>>>>>> Main thing in this patch is that the check should not be done against
> >> >>>>>>>>> non-IDN strings. I want this version of the patch to go in for that
> >> >>>>>>>>> reason as currently you cannot even complete ipa-adtrust-install
> >> >>>>>>>>> run due
> >> >>>>>>>>> to IDN normalisation check being applied to non-IDN domains.
> >> >>>>>>>>
> >> >>>>>>>> On non-IDN domains, the only effect of IDN normalization is that it
> >> >>>>>>>> lower-cases the names (right?), so the check should compare
> >> >>>>>>>> lower-cased original name with the normalized name, instead of
> >> >>>>>>>> special-casing certain characters etc.
> >> >>>>>>> .. what's the reason to do such comparison then? lower-cased non-IDN
> >> >>>>>>> name will be equal to lower-cased normalized non-IDN name by definition,
> >> >>>>>>> so the check is not needed in this case, at all.
> >> >>>>>>
> >> >>>>>> The point is that it works for both IDN and non-IDN, without
> >> >>>>>> u'\xdf'-style hacks.
> >> >>>>> No, your proposal of comparing low-cased value and normalized value is
> >> >>>>> not going to work because low-cased value is in general not equal to
> >> >>>>> normalized value for IDN names, only for non-IDN ones, due to the fact
> >> >>>>> that lower case for non-ASCII Unicode character may map to a completely
> >> >>>>> different character than in normalization situation. Take, for example,
> >> >>>>> Turkish alphabet where there are six letters with different case rules
> >> >>>>> (uppercase dotted i, dottless lowercase i, upper- and lowercase G with
> >> >>>>> breve accent, and upper- and lowercase S with cedilla), which will break
> >> >>>>> your generalized check.
> >> >>>>> So you'll anyway will need to split these cases.
> >> >>>>>
> >> >>>>
> >> >>>> I see.
> >> >>>>
> >> >>>> I'm still not comfortable with carrying the bit of knowledge about u'\xdf' in
> >> >>>> this particular spot. Can we check that a name is IDN some other way than
> >> >>>> "domain_name.is_idn() or u'\xdf' in value"?
> >> >>>
> >> >>> Why can't we simply fix string constants in ipa-adtrust-install and avoid
> >> >>> adding hacks for it?
> >> >> Because they are correct, in the sense that they follow what is defined
> >> >> for Active Directory. Yes, AD puts them in that case into DNS. There is
> >> >> simply no reason to force lower case for non-IDN names.
> >> >>
> >> >> That said, a newer fix is attached, where error message is formatted
> >> >> properly.
> >> >
> >> >I would personally be OK with the change if the is_* are fixed as Honza
> >> >proposed, current way is not so Python-ic/readable. I.e.:
> >> >
> >> >Instead of
> >> >+            is_idna = True in [encodings.idna.ToASCII(x) != x for x in labels]
> >> >Use
> >> >+            is_idna = any(encodings.idna.ToASCII(x) != x for x in labels)
> >> >
> >> >Instead of
> >> >+                is_nonnorm = True in [encodings.idna.nameprep(x) != x for x in
> >> >labels]
> >> >use
> >> >+                is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
> >> >
> >> >However, we can wait till Monday for Martin2's feedback.
> >> I've fixed this and also made a proper split on all dots that could
> >> separate labels according to RFC3490:
> >>
> >>     U+002E ( . ) FULL STOP
> >>     U+FF0E ( ． ) FULLWIDTH FULL STOP
> >>     U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
> >>     U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP
> >>
> >>
> >> _______________________________________________
> >> Freeipa-devel mailing list
> >> Freeipa-devel at redhat.com
> >> https://www.redhat.com/mailman/listinfo/freeipa-devel
> >
> >Hi,
> >I analyzed how python detects IDNA labels.
> >
> >Python tests if domain is IDNA in this way:
> >
> >def ToASCII(label):
> >    try:
> >        # Step 1: try ASCII
> >        label = label.encode("ascii")
> >    except UnicodeError:
> >        pass
> >    else:
> >        # Skip to step 3: UseSTD3ASCIIRules is false, so
> >        # Skip to step 8.
> >        if 0 < len(label) < 64:
> >            return label
> >        raise UnicodeError("label empty or too long")
> >
> >    # Step 2: nameprep
> >    label = nameprep(label)
> >...
> >
> >We can use 'label = label.encode("ascii")' to detect if IDNA is needed,
> >without idna.ToASCII() conversion, and then use:
> >
> >is_nonnorm = any(encodings.idna.nameprep(x) != x for x in labels)
> Sounds good but don't forget exceptions' handling. :)
> 

Updated patch attached.

I modified error messages, IDNA mapping is not only mapping to lowercase
-- 
Martin^2 Basti
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-mbasti-0070-2-Check-normalization-only-for-IDNA-domains.patch
Type: text/x-patch
Size: 3504 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20140630/f54adc5a/attachment.bin>