This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bug in mbrtowc?


On Jul 28 13:33, Andy Koppe wrote:
> 2009/7/28 Corinna Vinschen:
> >> >> Trouble is, the hack will also only work correctly if the whole UTF-8
> >> >> sequence for the non-BMP character is passed at once. If you pass the
> >> >> bytes one-by-one instead, and assuming the bug above wasn't there,
> >> >> you'd get this:
> >> >
> >> > Yes, I know. ?The real trouble is, I don't know how that can be fixed
> >> > in a still sort-of-POSIXy way.
> >>
> >> The way I'd suggested is sort-of-POSIXy, but perhaps not enough,
> >> because apps that check the mbrtowc() return code (and not the written
> >> wc) against zero will interpret a low surrogate as string end. An
> >> alternative might be to just return an error when there's no compliant
> >> way to return the low surrogate. Do you think either of these are
> >> worth pursuing?
> >
> > I'm thinking of faking a valid return of 1 (or 2, or 3) after the third byte
> > has been read. ?Three bytes are sufficient to create the first surrogate
> > half in wc.
> 
> Great idea!
> 
> I wouldn't even say it's fake, because as you say, you definitely have
> a high surrogate after three bytes. So just return the number of bytes
> actually used. It's also valid to leave it in a non-initial state
> after that; consider it the surrogate shift state or some such. And if
> the first byte in the next call isn't actually a valid fourth byte,
> just return an error.

I propsed a patch:

http://sourceware.org/ml/newlib/2009/msg00781.html


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]