[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

Dmitri Pal dpal at redhat.com
Tue Jan 26 22:46:58 UTC 2010


John Dennis wrote:
> I've run into a small problem with xgettext. By default xgettext
> expects all strings in an input file to be encoded in ascii. It will
> also allow you to override that by specifying the strings in the input
> file are utf-8.
>
> In ipappython/ipautil.py line 296 is the following string:
>
> SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|<)|[\000\n\r\200-\377]+|[ ]+$)'
>
> In it's default ascii mode xgettext throws an error claiming the
> string is not ascii. In fact xgettext is correct, the string is not
> ascii. (You may be wondering why xgettext even cares since it's not
> marked as translatable, but xgettext fully parses the input before
> deciding what is marked as translatable, bottom line: all strings get
> parsed and decoded).
>
> If I override the default ascii input by telling xgettext the input
> strings are encoded in utf-8 xgettext stops complaining, the string is
> properly skipped.
>
> But ... the string isn't really utf-8 either and I'm not sure how
> comfortable I feel about telling xgettext every string in IPA is
> encoded in utf-8 (when it isn't) just to get around this failure,
> especially since the offending string isn't even utf-8. (However,
> maybe we should allow utf-8 as an input format since ascii is a subset
> of utf-8, we might want to use utf-8 in the future and we can just
> hold our noses with respect to the above regular expression).
>
> Do we have a stake in the ground as to what our input strings are
> encoded in?
>
> Can you think of another way to express the offending string such that
> it doesn't trigger the non-ascii error? The only thing I could think
> of and get to work was this:
>

Put a comment and add the original string in the comment. I think that
would be sufficient and IMO we can use the representation below.

> SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c'
> % \
> (40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)
>
>
> Which is pretty unreadable, but with sufficient comments could be
> acceptable.
>
>


-- 
Thank you,
Dmitri Pal

Engineering Manager IPA project,
Red Hat Inc.


-------------------------------
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




More information about the Freeipa-devel mailing list