This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" character set (again)


2010/1/15 Corinna Vinschen:
>> Here's another concern regarding C changing to ASCII: what would a
>> user who sets LANG=C (or LANG=C.ASCII, for that matter) expect to
>> happen to filenames? Currently, anything non-ASCII would turn into
>> ^X-escaped UTF-8. However, since ASCII doesn't have anything beyond
>> 0x7F (btw, thanks for patching newlib accordingly), the ^X isn't
>> actually necessary and filenames in C(.ASCII) could just use straight
>> UTF-8 anyway.
>>
>> Therefore, would something like the patch below make sense?
>
> I'm pondering this for at least two weeks now. ÂI'm still not sure what
> new problems we add by reverting C to ASCII. ÂAs long as the underlying
> charset is UTF-8, I don't see any problems, but that could simply be the
> result of me being too unimaginative.
>
> Anyway, I have something like your patch already in my locale code. ÂI'm
> not setting the cygheap->locale.charset to UTF-8, though. ÂThis should
> avoid unnecessary calls to internal_setlocale in child processes.

Makes sense.

> I'll apply that, together with setting C to ASCII by default.

Great.

> And a matching change to the docs.

I'll have a closer look at the doc changes later, but one possible
issue is the use of 'default locale'. Unfortunately POSIX uses it for
two different things. From
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html:

1. 'For C-language programs, the POSIX locale shall be the *default
locale* when the setlocale() function is not called. The POSIX locale
can be specified by assigning to the appropriate environment variables
the values "C" or "POSIX".'

2. 'All implementations shall define a locale as the *default locale*,
to be invoked when no environment variables are set, or set to the
empty string. This default locale can be the POSIX locale or any other
implementation-defined locale.'

The implementation-defined default locale on Cygwin of course is "C.UTF-8".

Not sure what to do about this. I've been referring to "C.UTF-8" as
THE 'default locale' and to "C" as the initial locale, but of course
that's not backed by anything and doesn't necessarily help.

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]