Mailing List archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[linux-dvb] Re: DVB character coding...



From: "Johannes Stezenbach" <js@linuxtv.org>
> Hm, for 0x13 I found the following comment in our code:
>
> FIXME: document 595.doc on dvb.org states:
> 1. If the value of leading byte is "0x13";  then  the  remaining  bytes
>    are coded in pairs with the Big5 subset of Unicode 3.0. This Big5
>    subset  can be round-trip transcoded to the Big5 character standard
[5]
>    without loss of information. This Big5 subset of Unicode 3.0 contains
>    all  13,053 characters of Big5 character standard [5].
>
> (I don't know who wrote that or what 595.doc is.)

Sounds like it came from a draft... Coding 0x13 is Simplified Chinese
character set GB-2312-1980, whereas coding 0x14 now is the Big5 subset of
ISO/IEC 10646-1.

I found that the original DVB BlueBook as well as early versions of ETSI EN
300 468 only specified codings 0x01 through 0x05, 0x10 and 0x11. So
0x12/0x13/0x14 were added later, and that document may have been a draft
that was later changed.

But at least it does shed some light on what the description "Big5 subset
of ISO/IEC 10646-1 [8] for use with Traditional Chinese" in ETSI EN 300 468
V1.6.1 is supposed to mean - if your draft is correct, the coding is
Unicode and not Big5 as someone suggested, i.e. encoded in byte pairs and
not in bytes with escape sequences. So you treat it just like coding 0x11.
The only difference is that the characters used in this coding are
guaranteed to be transcodeable to Big5, whereas coding 0x11 could use
16-bit character codes that cannot be transcoded to Big5...

What is confusing, though, is that ETSI EN 300 468 does not specifically
mention that coding 0x14 is coded in byte pairs, whereas it does say so for
coding 0x11...

Regards,« 
-- 
Robert Schlabbach
e-mail: robert_s@gmx.net
Berlin, Germany






Home | Main Index | Thread Index