This is the mail archive of the
cygwin
mailing list for the Cygwin project.
[1.7] Support for CJK Character Sets
- From: neomjp <neomjp at yahoo dot co dot jp>
- To: cygwin at cygwin dot com
- Date: Sat, 4 Apr 2009 02:32:11 +0900 (JST)
- Subject: [1.7] Support for CJK Character Sets
On 2009/04/02 22:46, Corinna Vinschen wrote:
> > Btw., it's really not tricky to create a filename with special
> > characters:
I used this Corinna's tiny program
(http://sourceware.org/ml/cygwin/2009-04/msg00053.html )
to create a file with a name containing a CJK character and tested
how setting LANG works.
I changed 0x20ac to 0x4e00 (<CJK Ideograph, First>). This is one of the
characters used in all three languages. It is 0xe4 0xb8 0x80 in
hexadecimal UTF-8. So, without setting LANG, the file name should look
like "qq\016\344\270\200". (Note that the \016 is ASCII SO,
which shows that cygwin could not convert the next character to the
character set).
I checked how the look of the file name changes by setting LANG to each
character set. A list of supported character sets is found in
http://cygwin.com/1.7/cygwin-ug-net/setup-locale.html .
The result (see below) was that the filename was correctly converted
to UTF-8 or SJIS or GBK or Big5 or eucKR. They correctly matched the
name converted using iconv.
But it failed for JIS/ISO-2022-JP and eucJP. (It was represented as
ASCII SO(0x0e)/UTF-8 sequence).
What is going wrong here? What makes the file name conversion from
UTF-16 to these character sets to fail? Or, what am I doing wrong?
Any hints?
--
neomjp
for lang in UTF-8 SJIS GBK Big5 ISO-2022-JP eucJP eucKR ; do
export LANG="en_US.${lang}";
echo; echo LANG=${LANG};
ls q* | od -t x1 -t a;
export LANG="en_US.UTF-8";
echo "This must be identical to:"
ls q* | iconv -f UTF-8 -t ${lang} | od -t x1 -t a;
unset LANG ;
done;
LANG=en_US.UTF-8
0000000 71 71 e4 b8 80 0a
q q d 8 nul nl
0000006
This must be identical to:
0000000 71 71 e4 b8 80 0a
q q d 8 nul nl
0000006
LANG=en_US.SJIS
0000000 71 71 88 ea 0a
q q bs j nl
0000005
This must be identical to:
0000000 71 71 88 ea 0a
q q bs j nl
0000005
LANG=en_US.GBK
0000000 71 71 d2 bb 0a
q q R ; nl
0000005
This must be identical to:
0000000 71 71 d2 bb 0a
q q R ; nl
0000005
LANG=en_US.Big5
0000000 71 71 a4 40 0a
q q $ @ nl
0000005
This must be identical to:
0000000 71 71 a4 40 0a
q q $ @ nl
0000005
LANG=en_US.ISO-2022-JP
0000000 71 71 0e e4 b8 80 0a
q q so d 8 nul nl
0000007
This must be identical to:
0000000 71 71 1b 24 42 30 6c 1b 28 42 0a
q q esc $ B 0 l esc ( B nl
0000013
LANG=en_US.eucJP
0000000 71 71 0e e4 b8 80 0a
q q so d 8 nul nl
0000007
This must be identical to:
0000000 71 71 b0 ec 0a
q q 0 l nl
0000005
LANG=en_US.eucKR
0000000 71 71 ec e9 0a
q q l i nl
0000005
This must be identical to:
0000000 71 71 ec e9 0a
q q l i nl
0000005
--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/