[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?
Jason Gerard DeRose
jderose at redhat.com
Tue Jan 26 22:49:51 UTC 2010
On Tue, 2010-01-26 at 17:28 -0500, John Dennis wrote:
> I've run into a small problem with xgettext. By default xgettext expects
> all strings in an input file to be encoded in ascii. It will also allow
> you to override that by specifying the strings in the input file are utf-8.
>
> In ipappython/ipautil.py line 296 is the following string:
>
> SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|<)|[\000\n\r\200-\377]+|[ ]+$)'
ipapython still has a lot of legacy code, so first thing we should do is
check if we even use SAFE_STRING_PATTERN. Rob, do you know off hand?
> In it's default ascii mode xgettext throws an error claiming the string
> is not ascii. In fact xgettext is correct, the string is not ascii. (You
> may be wondering why xgettext even cares since it's not marked as
> translatable, but xgettext fully parses the input before deciding what
> is marked as translatable, bottom line: all strings get parsed and decoded).
>
> If I override the default ascii input by telling xgettext the input
> strings are encoded in utf-8 xgettext stops complaining, the string is
> properly skipped.
>
> But ... the string isn't really utf-8 either and I'm not sure how
> comfortable I feel about telling xgettext every string in IPA is encoded
> in utf-8 (when it isn't) just to get around this failure, especially
> since the offending string isn't even utf-8. (However, maybe we should
> allow utf-8 as an input format since ascii is a subset of utf-8, we
> might want to use utf-8 in the future and we can just hold our noses
> with respect to the above regular expression).
>
> Do we have a stake in the ground as to what our input strings are
> encoded in?
>
> Can you think of another way to express the offending string such that
> it doesn't trigger the non-ascii error? The only thing I could think of
> and get to work was this:
>
> SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c'
> % \
> (40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)
>
> Which is pretty unreadable, but with sufficient comments could be
> acceptable.
>
>
More information about the Freeipa-devel
mailing list