This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: default charset for imlicit locale specificatio
On Jan 20 07:29, Andy Koppe wrote:
> However, as Thomas Wolff mentioned previously, there's a de-facto
> standard for the charset used with each language when none is
> specified explicitly, so implementing that instead is worth
> considering.
The problem is that this information isn't provided by Windows. I can
fetch the ANSI or OEM codepage, but not the ISO-8859 compatible codepage
for a language, if such a codepage exists.
Further testing shows that only a handful of codepages are used as
default ANSI codepages for languages. This would make a very small
transition table:
874 ANSI/Thai -> CP874 (== ISO-IR-166 used on Linux)
932 SJIS -> SJIS
936 GB2312 -> GBK
949 ANSI/Korean -> EUCKR
950 Big-5 -> Big-5
1250 ANSI/Central European -> ISO-8859-2
1251 ANSI/Cyrillic -> ISO-8859-5
1252 ANSI/Latin 1 -> ISO-8859-1
1253 ANSI/Greek -> ISO-8859-7
1254 ANSI/Turkish -> ISO-8859-9
1255 ANSI/Hebrew -> ISO-8859-8
1256 ANSI/Arabic -> ISO-8859-6
1257 ANSI/Baltic -> ISO-8859-4
1258 ANSI/Vietnamese -> UTF-8
65001 UTF-8 -> UTF-8
Is that a valid transition?
What's missing is a transition to ISO-8859-15 for languages with the
EUR currency letter. I assume that's by adding the @euro modifier?
> But at least the Windows-based solution should come
> fairly close to it, because many of the Windows codepages are largely
> compatible to their ISO equivalents. And it uses data that's already
> there, avoiding the need for maintaining a mapping table.
That's what I like most. Windows has (almost) all the information
we need. Why not just use it?
> Btw, just out of curiosity, how do you find the Windows locale for a
> given POSIX locale? Do you have to iterate through all the Windows
> locales until finding one with the correct ISO language and territory
> codes?
Starting with Windows Vista, Windows uses (almost) POSIX compatible
locale strings, rather than numerical LCIDs to specify a locale. For
instance, "German (Germany)" has the locale string "de-DE". The only
difference is the dash instead of the underscore. Windows also knows
languages without territory, like "de". there's a new call
LocaleNameToLCID(), which converts the (almost) POSIX compatible locale
string to an LCID, so I can use LCIDs for further stuff. On systems
before Vista I have to iterate through the LCIDs, but that's quickly done
since the valid range is small.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat