[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

Jason Gerard DeRose jderose at redhat.com
Tue Jan 26 22:49:51 UTC 2010


On Tue, 2010-01-26 at 17:28 -0500, John Dennis wrote:
> I've run into a small problem with xgettext. By default xgettext expects 
> all strings in an input file to be encoded in ascii. It will also allow 
> you to override that by specifying the strings in the input file are utf-8.
> 
> In ipappython/ipautil.py line 296 is the following string:
> 
> SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|<)|[\000\n\r\200-\377]+|[ ]+$)'

ipapython still has a lot of legacy code, so first thing we should do is
check if we even use SAFE_STRING_PATTERN.  Rob, do you know off hand?

> In it's default ascii mode xgettext throws an error claiming the string 
> is not ascii. In fact xgettext is correct, the string is not ascii. (You 
> may be wondering why xgettext even cares since it's not marked as 
> translatable, but xgettext fully parses the input before deciding what 
> is marked as translatable, bottom line: all strings get parsed and decoded).
> 
> If I override the default ascii input by telling xgettext the input 
> strings are encoded in utf-8 xgettext stops complaining, the string is 
> properly skipped.
> 
> But ... the string isn't really utf-8 either and I'm not sure how 
> comfortable I feel about telling xgettext every string in IPA is encoded 
> in utf-8 (when it isn't) just to get around this failure, especially 
> since the offending string isn't even utf-8. (However, maybe we should 
> allow utf-8 as an input format since ascii is a subset of utf-8, we 
> might want to use utf-8 in the future and we can just hold our noses 
> with respect to the above regular expression).
> 
> Do we have a stake in the ground as to what our input strings are 
> encoded in?
> 
> Can you think of another way to express the offending string such that 
> it doesn't trigger the non-ascii error? The only thing I could think of 
> and get to work was this:
> 
> SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c' 
> % \
> (40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)
> 
> Which is pretty unreadable, but with sufficient comments could be 
> acceptable.
> 
> 




More information about the Freeipa-devel mailing list