[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Why is "LANG=en_US.UTF-8" the default in Fedora

----- Original Message ----- 
From: "Shahms King" <shahms shahms com>
To: "For testers of Fedora Core development releases"
<fedora-test-list redhat com>
Sent: Friday, May 21, 2004 5:44 PM
Subject: Re: Why is "LANG=en_US.UTF-8" the default in Fedora

> Wrong again.  Take a look at the copy of the "C" standard you have next
> to you.  Look up "strcoll" or "setlocale" and suprise, suprise, they
> specify the behavior you're complaining about.  It's more than
> conceptually reasonable and it has nothing to do with Unicode.  You're
> complaining about two entirely separate issues.  Try removing the
> ".UTF-8" from the locale and watch as nothing changes with the sort
> order.  Applications which expect strcoll to behave like strcmp are, by
> definition, broken.  Yes, the shift to Unicode has been painful but the
> sort order is the smallest impact by far (and only tangentially related
> to Unicode).  Again, if you're expecting strcoll to behave like strcmp,
> your code is broken.  If you're expecting LANG=C behavior from 'ls',
> then either specify that or simply sort the output after you read it in
> using strcmp!  Breaking working apps to work around broken apps is a
> horrible idea, the working apps break while the broken apps never get
> fixed.

The implementation of Unicode as a replacement for the older standard of "C"
broke many, many working apps.

The sort order is an *example* of the fun and whackiness: it means that the
output of an extremely old and extremely common command, "sort", is modified
by an entirely new environment variable that didn't didn't even exist back
when "sort" was originally written. The alteration of the behavior mandates
a set of new regression tests across multiple locales, and frankly a lot of
authors of various multi-lingual tools and especially of documentation are
not working in locale-using environments.

Try another example. do a "find" command on /usr/share/man, and then do a
"man" of 20 or so randomly selected man pages under en_US.UTF-8. First,
marvel at all the amazingly unprintable or unparseable crap that shows up.
Then try to deal with piping that through pagers and the pager's abilities
to do a "find this string" function, such as the "/" command for more or
less, and see how the string you actually "found" is not on the page

Now set LANG=C and try the same thing. Everything works now, doesn't it?
Guess how old most of those now-fractured man pages are? It worked
beautifully when the author wrote it. This means re-writing a lot of old
documentation, which has thankfully been proceeding apace for a lot of
developers, but it's awfully hard when you have some in-house tool written
by someone who was nice enough to write the man page 10 years ago.

Or to quote Mr. Cosby, "Have you looked at the mess in the bottom of that
ark? Who's gonna clean up that mess in the bottom of the ark!"

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]