Unicode support in fedora (was: Re: flac/mp3 tagging Latin characters)

James Wilkinson james at westexe.demon.co.uk
Mon Dec 20 01:22:18 UTC 2004


Nadeem Bitar wrote:
> thanks for the detailed explanation. I actually understand unicode
> pretty well but I am curious why it doesn't "just work" by now in
> Fedora and probably other linux distributions.

Sorry, I wasn't sure how far back I had to go.

By and large, actually *within* Fedora things do work well.

The problem is that there is this assumption that there's a simple
relationship between numbers and characters. [1] And that is no longer
true.

There's now a simple relationship between numbers, characters, and the
character set in use. But too much stuff forgets to record the character
set. At that point, all data is basically lost, and all that can be done
is guess. It *mostly* works.

In particular, it looks like MP3s lose this information: there's nowhere
in the standard for it to go. (As far as I can tell: I haven't looked at
the standard). Adding a place for it to go would break binary
compatibility, and existing programs wouldn't know what to do with the
character set data anyway.

I understand that a lot of filesystems also lose data on the character
set in use (and hence on all non-ASCII characters). And those
filesystems that don't lose data "save" it by declaring that All
Filenames Must Be In UTF-16, or whatever.

James.

[1] Unicode breaks that, but it's a well-understood system, the breakage
is necessary for Unicode to do what it does, and the breakage isn't too
widespread, as far as I can tell.

-- 
E-mail address: james | When I was young I wanted to be a fireman, but I
@westexe.demon.co.uk  | dropped that idea when they explained to me that
                      | firemen don't actually make fires.
                      |     -- Konqi the dragon, KDE's mascot




More information about the fedora-list mailing list