This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[1.7] Support for CJK Character Sets


On 2009/04/02 22:46, Corinna Vinschen wrote:
> > Btw., it's really not tricky to create a filename with special
> > characters:

I used this Corinna's tiny program
(http://sourceware.org/ml/cygwin/2009-04/msg00053.html )
to create a file with a name containing a CJK character and tested
how setting LANG works.

I changed 0x20ac to 0x4e00 (<CJK Ideograph, First>). This is one of the
characters used in all three languages. It is 0xe4 0xb8 0x80 in
hexadecimal UTF-8. So, without setting LANG, the file name should look
like "qq\016\344\270\200". (Note that the \016 is ASCII SO,
which shows that cygwin could not convert the next character to the
character set).

I checked how the look of the file name changes by setting LANG to each
character set. A list of supported character sets is found in
http://cygwin.com/1.7/cygwin-ug-net/setup-locale.html .

The result (see below) was that the filename was correctly converted
to UTF-8 or SJIS or GBK or Big5 or eucKR. They correctly matched the
name converted using iconv.

But it failed for JIS/ISO-2022-JP and eucJP. (It was represented as
ASCII SO(0x0e)/UTF-8 sequence).

What is going wrong here? What makes the file name conversion from
UTF-16 to these character sets to fail? Or, what am I doing wrong?

Any hints?
--
neomjp


for lang in  UTF-8 SJIS GBK Big5 ISO-2022-JP eucJP eucKR ; do
  export LANG="en_US.${lang}";
  echo; echo LANG=${LANG};
  ls q* | od -t x1 -t a;
  export LANG="en_US.UTF-8";
  echo "This must be identical to:"
  ls q* | iconv -f UTF-8 -t ${lang}  | od -t x1 -t a;
  unset LANG ;
done;

LANG=en_US.UTF-8
0000000  71  71  e4  b8  80  0a
          q   q   d   8 nul  nl
0000006
This must be identical to:
0000000  71  71  e4  b8  80  0a
          q   q   d   8 nul  nl
0000006

LANG=en_US.SJIS
0000000  71  71  88  ea  0a
          q   q  bs   j  nl
0000005
This must be identical to:
0000000  71  71  88  ea  0a
          q   q  bs   j  nl
0000005

LANG=en_US.GBK
0000000  71  71  d2  bb  0a
          q   q   R   ;  nl
0000005
This must be identical to:
0000000  71  71  d2  bb  0a
          q   q   R   ;  nl
0000005

LANG=en_US.Big5
0000000  71  71  a4  40  0a
          q   q   $   @  nl
0000005
This must be identical to:
0000000  71  71  a4  40  0a
          q   q   $   @  nl
0000005

LANG=en_US.ISO-2022-JP
0000000  71  71  0e  e4  b8  80  0a
          q   q  so   d   8 nul  nl
0000007
This must be identical to:
0000000  71  71  1b  24  42  30  6c  1b  28  42  0a
          q   q esc   $   B   0   l esc   (   B  nl
0000013

LANG=en_US.eucJP
0000000  71  71  0e  e4  b8  80  0a
          q   q  so   d   8 nul  nl
0000007
This must be identical to:
0000000  71  71  b0  ec  0a
          q   q   0   l  nl
0000005

LANG=en_US.eucKR
0000000  71  71  ec  e9  0a
          q   q   l   i  nl
0000005
This must be identical to:
0000000  71  71  ec  e9  0a
          q   q   l   i  nl
0000005

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]