This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: "C" character set (again)
According to Andy Koppe on 12/28/2009 11:54 PM:
> Following the "printf treats differently a string constant and a
> character array" issue at
> http://cygwin.com/ml/cygwin/2009-12/msg01009.html, I'm wondering again
> whether the "C" locale shouldn't go back to using ASCII rather than
> UTF-8, to avoid surprises like that and also to fit with many people's
> expectation that "C" means ASCII. I think that would save us a bunch
> of trouble and pointless legal/religious discussions about the C
> locale.
Bytes with the 8th bit set are not portable in the C locale, regardless of
whether that locale uses ASCII or UTF-8 encoding. Yes, we will have to
field complaints from users with non-portable programs. But I don't think
we have to change back to ASCII - we are doing those users a service by
making them fix their portability bugs.
On the other hand, I wonder if it may be possible to special case the
C.UTF-8 locale to treat invalid byte sequences as pseudo-characters, such
that we can achieve 8-bit transparency in character contexts such as
printf rather than failing with EILSEQ. But such special-casing should be
reserved for C.UTF-8; locales like en_US.UTF-8 should still fail with
EILSEQ on invalid sequences.
--
Don't work too hard, make some time for fun as well!
Eric Blake ebb9@byu.net