This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: [1.7] wcwidth failing configure tests]


On May 13 20:04, Andy Koppe wrote:
> 2009/5/12 Corinna Vinschen:
> >> Trouble is, there's the thorny issue of the "CJK Ambiguous Width"
> >> category of characters, which consists of things like Greek and
> >> Cyrillic letters as well as line drawing symbols. Those have a width
> >> of 1 in Western use, yet with CJK fonts they have a width of 2. That's
> >> why Markus Kuhn's code includes the mk_wcswidth_cjk() variant.
> >
> > We should use the standard variation alone, imho.
> 
> I'm not sure that CJK users would be happy with that. See MinTTY issue
> 88 for my misguided attempts to dismiss this as a legacy issue:
> http://code.google.com/p/mintty/issues/detail?id=88
> 
> In comment 8 on that, "deenheart" mentioned that he was working on a
> fix for wcwidth(). I don't know what he had in mind, but I'd suspect
> something based on an environment variable setting.
> 
> > And we need some workaround for UTF-16 systems like Cygwin.
> > Unfortunately, surrogate pairs only work well as part of a string, not
> > as standalone chars. ?So wcwidth would return -1 for each single char,
> > but wcswidth could be tweaked to handle them gracefully.
> 
> Looking at the ranges in wcwidth.c, it might be possible to decide the
> width of a surrogate pair based on the high surrogate only, and then
> treat the low surrogate as a combining character with length 0.

How should that work?  The first half of the surrogate pair has not
enough information to decide that.  For instance, take the ranges
0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }.  The information about the low
10 bits of the Unicode value is in the second half of the pair.  From
the first half you don't know if the char is perhaps the 0x10A04 value
or one of the other.  So you need both halves to make a decision.

A surrogate pair half alone is also always invalid.  That's something
you can't handle in wcwidth.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]