This is the mail archive of the cygwin mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi Stuart, On Mar 30 13:04, Corinna Vinschen wrote: > On Mar 25 14:34, Kyzer wrote: > > Hello, > > > > I've found that if you use cygwin to create a file with badly-encoded > > UTF-8, readdir() gives out an entry with a name that cygwin won't > > subsequently accept. > > > > * create a file using filename with hex bytes F4 8F BF BF > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails > > * attempting to open or unlink the filename F4 8F BF BF succeeds > > Thanks for the testcase. I'll have a look later this week (I hope). Wow. Just wow. You found a long-standing bug in the wctomb conversion from UTF-16 to UTF-8. As you probably know, Unicode values beyond the base plane (that is, everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) are represented as so-called surrogate pairs in UTF-16, two UTF-16 values in the 0xd800 - 0xdfff range. While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff worked fine, the conversion back to UTF-8 has a subtil bug. There's a test for a lone high surrogate pair in the underlying conversion function. This tests the next UTF-16 value like this: if (wchar < 0xdc00 || wchar >= 0xdfff) /* Handle lone high surrogate */ Notice the >= 0xdfff? That should have been > 0xdfff. Duh. This bug is only a bit over 5 years old... Fixed in the git repo. I'l regenerate the today's fool..., erm, the today's developer snapshot on https://cygwin.com/snapshots/ later today. Thanks, especially for the simple testcase, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
Attachment:
pgpw6N3MdZhUD.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |