This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" UTF-8 trouble


ext Corinna Vinschen wrote:
On Oct 7 11:08, Andy Koppe wrote:
2009/10/7 Corinna Vinschen:
Urgh. So we have to change nl_langinfo in newlib as well. Do we have
to return "US-ASCII" if charset is "ASCII", or is it sufficient to
return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
I'd assume so, but WWLD?

=== #include <stdio.h> #include <locale.h> #include <langinfo.h>

int main ()
{
  char *l;

  setlocale (LC_ALL, "");
  l = nl_langinfo (CODESET);
  if (l)
    printf ("%s\n", l);
  return 0;
}
===

$ ./nll
ANSI_X3.4-1968

$ LANG=C.UTF-8 ./nll
ANSI_X3.4-1968

$ LANG=ja_JP ./nll
EUC-JP

$ LANG=ru_RU ./nll
ISO-8859-5

$ LANG=ru_UA ./nll
KOI8-U

$ LANG=zh_CN ./nll
GB2312

$ LANG=zh_TW ./nll
BIG5

Sigh. Do we really need a translation table?
Yes (sigh). And yes, that's what I had suggested before. Actually, "locale charmap" (on a system with a locale command) gives you the same information as "nll".
If you want a table, a fairly complete one is included in my package mined, file src/locales.t (generated from src/locales.cfg).
(Complete in the sense that all locales without explicit suffix not listed here map to ISO-8859-1; maybe I should also include them to distinguish unknown locales ...)
And, as becomes clear here, the syntax of charmap/codeset names is different between locale names and nl_langinfo,
e.g. eucJP vs. EUC-JP.


Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]