This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unable to open files including Korean names


On Tue, 2004-06-15 09:14:22 -0400, Pierre A. Humblet wrote:
> Thanks. Nothing conclusive.
> Could you compile and run the following one line program? 
> 
> #include <windows.h>
> #include <stdio.h>
> 
> main()
> {
>     printf("AreFileApisANSI %d\n", AreFileApisANSI()); 
> }
>  
> Compile it with
> gcc -mno-cygwin try_ansi.c 
> 
> With the -mno-cygwin, the value of CYGWIN=codepage:oem
> shouldn't matter. When compiled without that switch
> codepage:oem or codepage:ansi should matter.
> 
> Running on 1.5.9 is OK.

Here's the result:

$ gcc -mno-cygwin try_ansi.c 
$ ./a.exe 
AreFileApisANSI 1
$ 

> 
> Also, the Korean directory name has numerical value
> ~> od -x xx.txt 
> 0000000 d1c7 dbb1
> 
> Do you know what encoding that is? Is it Unicode or UTF8?
> If it is UTF8, do you know what the Unicode values should be?

Well, that's in EUC-KR and CP949.  CP949 has some more characters
defined in the empty areas of EUC-KR.  The directory name I used,
``한글'', which is pronounced ``hangeul'' and means Korean (written
language) in Korean, is consisted of two characters:
 U+D55C: Hangul syllable Hieuh A Nieun,
 U+AE00: Hangul syllable Kiyeok Eu Rieul.
(Perhaps, you may be able to find it from Windows charmap)
Neither character is in CP949's extension, so they have identical values
in both EUC-KR and CP949 encoding.

Yes, you gave me the identical numerical value I use.  
Running, `echo -n 한글 | od -x -` tells me:
0000000 d1c7 dbb1

Now, `echo -n 한글 | iconv -f euc-kr -t utf-8 | od -x -` tells me:
0000000 95ed ea9c 80b8

Yes, it's in EUC-KR (or CP949 equivalently in this case).  I don't use
unicode environment yet.  Actually, I don't know how to change encoding
from Windows.  Korean version of Windows just uses CP949 as default.

Looks like od's output is in little-endian.  This identifies them as
U+D55C and U+AE00, `echo -n 한글 | iconv -f euc-kr -t ucs-2 | od -x -`:
0000000 5cd5 00ae


> Thanks for your help

My pleasure. :)


BTW, is there any reason you not sending your msgs to cygwin ML?
If not, I'll just keep Cc'ing to it.

-- 
신재호 | Jaeho Shin <netj@sparcs.kaist.ac.kr> | http://netj.org/
System Programmers' Association for Researching Computer Systems
Division of Computer Science, Department of EECS, KAIST

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]