[vdr] vdr-1.3.27 and UTF-8

Sergei Haller Sergei.Haller at math.uni-giessen.de
Wed Jul 20 19:12:23 CEST 2005


On Wed, 20 Jul 2005, Klaus Schmidinger (KS) wrote:

> > 
> > I think the confusion comes from the assumption that a character is
> > exactly one byte long.
> > 
> > strlen counts bytes not characters. 
> > in utf-8 a character can be up to 4 (or was it 8) bytes long.
> > 
> > IIRC, there are new functions to count characters (wstrlen, wstrcmp,
> > etc.)
> 
> Aren't you confusing this with "wide character" functions?

yes, I am talking about wide characters. I don't think I am confusing 
anything (correct me if I'm wrong) 

from glibc manual:

> Introduction to Extended Characters
> 
> A variety of solutions is available to overcome the differences between 
> character sets with a 1:1 relation between bytes and characters and 
> character sets with ratios of 2:1 or 4:1. [...]
> 
> As shown in some other part of this manual, a completely new family has 
> been created of functions that can handle wide character texts in 
> memory. The most commonly used character sets for such internal wide 
> character representations are Unicode and ISO 10646 [...] Unicode was 
> originally planned as a 16-bit character set; whereas, ISO 10646 was 
> designed to be a 31-bit large code space. [...]
>
> UTF-8 is an ASCII compatible encoding where ASCII characters are 
> represented by ASCII bytes and non-ASCII characters by sequences of 2-6 
> non-ASCII bytes [...]
>
> To represent wide characters the char type is not suitable.
> For this reason the ISO C standard introduces [...] wchar_t,
> [...]



        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller at math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain



More information about the vdr mailing list