This is the mail archive of the
cygwin
mailing list for the Cygwin project.
bug in mbrtowc?
- From: Andy Koppe <andy dot koppe at gmail dot com>
- To: Cygwin Tech List <cygwin at cygwin dot com>
- Date: Mon, 27 Jul 2009 22:56:34 +0100
- Subject: bug in mbrtowc?
I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
Here's an example:
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
int main(void) {
wchar_t wc;
size_t ret;
mbstate_t s = { 0 };
puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
printf("%x\n", wc);
return 0;
}
The sequence E2 94 84 should translate to U+2514. Instead, the second
and third calls to mbrtowc report encoding errors. It does work
correctly if the three bytes are passed to mbrtowc() in one go:
printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
Andy
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple