OT FireFox - ascii/html symbols displayed

Sun Apr 8 16:15:55 UTC 2007

On Sun, 2007-04-08 at 08:27 -0600, David G. Miller wrote:
> You may find this link helpful:
> 
> http://www.w3schools.com/tags/ref_entities.asp
> 
> Everything below   is dependent on which fonts and character sets
> FF knows about (See View -> Character Encoding).  You run into
> problems when the original was not created using a font FF knows about
> or is not correctly specified in the page.  You can sometimes "fix"
> the problem of a page not correctly specifying the font by forcing FF
> to use the correct font for the page through the View -> Character
> Encoding menu item (e.g., you know what language was used when the
> page was created).

AGGGRRHH  NOO!  Do not cite W3Schools as a reference, it's full of crap,
and that's just one example.

Character entities have *nothing* to do with fonts.  And you can't
arbitrarily say that above or below a certain number will be good or
bad.  Neither does anything rendered in HTML have anything to do with
fonts.  You can change fonts until you're blue in the face, but the code
for the letter "a" represents the letter "a", and nothing else; whether
it shows it as an "a" or displays some other symbol (e.g. the abuse of
fonts done by wingdings).  You don't fix erroneous editing by trying to
make a browser use a font that's stuffed up to suit the author.

Language is another red herring.  In English, German, Italian, even
American (sarcastic dig intended), "a" is "a".  They can all use ASCII,
if it carries enough characters for what's typed.  I can use ISO-8859-1
or ISO-8859-15 for much of what I type.  They can use character
encodings that aren't typical for their language.  And so on...

It's character encoding that's important (e.g. UTF8, ISO-8859-1, etc.).
Specifing what coding you used, so that the browser knows what it's
supposed to display.

A user types "a" on their keyboard, their editor inserts code into the
document that represents "a".  In ASCII, ISO-8859-1, UTF-8, and several
others, that's character code 97 (decimal).  The document reader needs
to know which encoding scheme was used, so that it can translate
character 97 into the local scheme (even if it happens to be the same
one), and show you an "a".

Entities are a different thing, again.  They're the character number in
the HTML character set.  It's coincidental that some entity numbers
match some other encoding schemes (e.g. ASCII).  Incorporating, say
_ into a page is using character 95 from the HTML character set, not
character 95 from an ASCII, or other, set.

In the HTML set, character number [whatever] is *always* character
number [whatever], all the time, regardless of what encoding is used in
the text file (ISO-8859, etc.).  It always has the same meaning, unlike
other schemes which may each have different characters for character
number 132, for example.  It doesn't even have to be the same number as
whatever character encoding was used to send the information (fancy
quote marks are a prime example of this, they don't exist in ASCII or
ISO-8859-1, exist in a different place in Win-1252 than in the HTML
character set - which is unicode, for all intents and purposes).

-- 
(This box runs FC6, my others run FC4 & FC5, in case that's
 important to the thread.)

Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.