[vdr] vdr-1.3.27 and UTF-8
Sergei Haller
Sergei.Haller at math.uni-giessen.de
Wed Jul 20 19:12:23 CEST 2005
On Wed, 20 Jul 2005, Klaus Schmidinger (KS) wrote:
> >
> > I think the confusion comes from the assumption that a character is
> > exactly one byte long.
> >
> > strlen counts bytes not characters.
> > in utf-8 a character can be up to 4 (or was it 8) bytes long.
> >
> > IIRC, there are new functions to count characters (wstrlen, wstrcmp,
> > etc.)
>
> Aren't you confusing this with "wide character" functions?
yes, I am talking about wide characters. I don't think I am confusing
anything (correct me if I'm wrong)
from glibc manual:
> Introduction to Extended Characters
>
> A variety of solutions is available to overcome the differences between
> character sets with a 1:1 relation between bytes and characters and
> character sets with ratios of 2:1 or 4:1. [...]
>
> As shown in some other part of this manual, a completely new family has
> been created of functions that can handle wide character texts in
> memory. The most commonly used character sets for such internal wide
> character representations are Unicode and ISO 10646 [...] Unicode was
> originally planned as a 16-bit character set; whereas, ISO 10646 was
> designed to be a 31-bit large code space. [...]
>
> UTF-8 is an ASCII compatible encoding where ASCII characters are
> represented by ASCII bytes and non-ASCII characters by sequences of 2-6
> non-ASCII bytes [...]
>
> To represent wide characters the char type is not suitable.
> For this reason the ISO C standard introduces [...] wchar_t,
> [...]
Sergei
--
-------------------------------------------------------------------- -?)
eMail: Sergei.Haller at math.uni-giessen.de /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
-- Mark Twain
More information about the vdr
mailing list