[linux-dvb] Re: DVB character coding...

To: linux-dvb@linuxtv.org
Subject: [linux-dvb] Re: DVB character coding...
From: Jesper Sörensen <jesper@datapartner.se>
Date: Tue, 14 Dec 2004 15:40:56 +0100
Content-transfer-encoding: 8bit
Content-type: text/plain; charset=windows-1252; format=flowed
In-reply-to: <001a01c4e1e2$ea42ee90$0200a8c0@powerstation>
List-help: <mailto:ecartis@linuxtv.org?subject=help>
List-id: linux-dvb <linux-dvb.mail>
List-owner: <mailto:listmaster@convergence.de>
List-post: <mailto:linux-dvb@linuxtv.org>
List-software: Ecartis version 1.0.0
List-subscribe: <mailto:ecartis@linuxtv.org?subject=subscribe%20linux-dvb>
List-unsubscribe: <mailto:ecartis@linuxtv.org?subject=unsubscribe%20linux-dvb>
References: <000d01c4e178$673295b0$0200a8c0@powerstation> <41BEE30E.3000801@datapartner.se> <001a01c4e1e2$ea42ee90$0200a8c0@powerstation>
Sender: linux-dvb-bounce@linuxtv.org
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Robert Schlabbach skrev:

From: "Jesper Sörensen" <jesper@datapartner.se>

The wording in annex A isn't that good and I had some problems figuring
out what they meant too. Anyway, I wouldn't look too carefully at those
tables. I think what they mean is that unless some other coding is
specified you should use Latin-1 (ISO 8859-1) which makes sense since it
is the most widely used coding in the west and on the net. The 0xE9 will
then indeed be mapped into "é" like expected.

I think the wording is pretty clear:

| Annex A (normative):
| Coding of text characters
[...]
| if the first byte of the text field has a value in the range "0x20"
| to "0xFF" then this and all subsequent bytes in the text item are
| coded using the default character coding table (table 00 - Latin
| alphabet) of figure A.1

Figure A.1 is a superset of ISO/IEC 6937, *not* any of the ISO/IEC 8859-x
tables. Using this table, the character "é" would have to be composed with
the sequence 0xC2 0x65.

Note that this is a _normative_ Annex, i.e. this is part of the standard,
not an option. It does appear, though, that not even professional tools
properly implement character encoding/decoding that fully complies with
this standard...

Yeah, I don't know if it's the standard or the implementations that are broken. I can only tell you that my DVB feed looks the same as yours. It uses Latin-1 and it doesn't have any "charset escape". Maybe I'm just stupid but it doesn't make much sense to me to include all the other 8859-x encodings and not have Latin-1. AFAIK 8859-15 is mostly the same as Latin-1 but not quite...

BTW, do you happen to know what they mean when they say the following (WRT 16-bit codings):

* if the first byte of the text field has a value "0x10" then the following two bytes carry a 16-bit value (uimsbf) N
to indicate that the remaining data of the text field is coded using the character code table specified by
ISO Standard 8859, parts 1 to 9;

What does that mean? High byte selects table and low byte has the character code?

* if the first byte of the text field has a value "0x11" then the remaining bytes in the text item are coded in pairs
in accordance with the Basic Multilingual Plane of ISO/IEC 10646-1 [8];

Is this UCS-2?

Follow-Ups:
- [linux-dvb] Re: DVB character coding...
  - From: Johannes Stezenbach <js@linuxtv.org>

References:
- [linux-dvb] DVB character coding...
  - From: "Robert Schlabbach" <robert_s@gmx.net>
- [linux-dvb] Re: DVB character coding...
  - From: Jesper Sörensen <jesper@datapartner.se>
- [linux-dvb] Re: DVB character coding...
  - From: "Robert Schlabbach" <robert_s@gmx.net>

Prev by Date: [linux-dvb] Re: DVB character coding...
Next by Date: [linux-dvb] Re: [PATCH] Support for B2C2/BBTI Air2PC-ATSC NXT2002 Frontend
Previous by thread: [linux-dvb] Re: DVB character coding...
Next by thread: [linux-dvb] Re: DVB character coding...
Index(es):
- Date
- Thread

Home | Main Index | Thread Index