Mailing List archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[linux-dvb] Re: DVB character coding...
Robert Schlabbach skrev:
From: "Jesper Sörensen" <jesper@datapartner.se>
The wording in annex A isn't that good and I had some problems figuring
out what they meant too. Anyway, I wouldn't look too carefully at those
tables. I think what they mean is that unless some other coding is
specified you should use Latin-1 (ISO 8859-1) which makes sense since it
is the most widely used coding in the west and on the net. The 0xE9 will
then indeed be mapped into "é" like expected.
I think the wording is pretty clear:
| Annex A (normative):
| Coding of text characters
[...]
| if the first byte of the text field has a value in the range "0x20"
| to "0xFF" then this and all subsequent bytes in the text item are
| coded using the default character coding table (table 00 - Latin
| alphabet) of figure A.1
Figure A.1 is a superset of ISO/IEC 6937, *not* any of the ISO/IEC 8859-x
tables. Using this table, the character "é" would have to be composed with
the sequence 0xC2 0x65.
Note that this is a _normative_ Annex, i.e. this is part of the standard,
not an option. It does appear, though, that not even professional tools
properly implement character encoding/decoding that fully complies with
this standard...
Yeah, I don't know if it's the standard or the implementations that are
broken. I can only tell you that my DVB feed looks the same as yours. It
uses Latin-1 and it doesn't have any "charset escape". Maybe I'm just
stupid but it doesn't make much sense to me to include all the other
8859-x encodings and not have Latin-1. AFAIK 8859-15 is mostly the same
as Latin-1 but not quite...
BTW, do you happen to know what they mean when they say the following
(WRT 16-bit codings):
* if the first byte of the text field has a value "0x10" then the
following two bytes carry a 16-bit value (uimsbf) N
to indicate that the remaining data of the text field is coded using the
character code table specified by
ISO Standard 8859, parts 1 to 9;
What does that mean? High byte selects table and low byte has the
character code?
* if the first byte of the text field has a value "0x11" then the
remaining bytes in the text item are coded in pairs
in accordance with the Basic Multilingual Plane of ISO/IEC 10646-1 [8];
Is this UCS-2?
Home |
Main Index |
Thread Index