[1.7] UTF-8, find vs. tar

Corinna Vinschen corinna-cygwin@cygwin.com
Fri Sep 25 09:15:00 GMT 2009


On Sep 24 22:40, Yaakov S wrote:
> I'm having some difficulty with a package containing a file with a UTF-8  
> character:
>
> wget http://downloads.sourceforge.net/klavaro/klavaro-1.3.1.tar.bz2
> tar jxf klavaro-1.3.1.tar.bz2
> cd klavaro-1.3.1
> tar jcf TEST.tar.bz2 data/dvorak_fr*
> tar jtf TEST.tar.bz2 > tmptar.out
> find data/ -name 'dvorak_fr*' > tmpfind.out
> diff -u tmpfind.out tmptar.out
>
> The character in question is 'é' (aka U+00E9, small e with acute)[1].  
> The difference in rendering is throwing cygport off at the "checking  
> packages for missing/duplicate files" stage.
>
> What to I need to do to get these to match?

Nothing but wait.  The reason that tar doesn't print the characters
while find does is probably related to find callng setlocale and
tar doesn't.  I hope to get this fixed in the next couple of days.
We're discussing the entire locale stuff on the cygwin-developers
list right now, see the threads starting at
http://cygwin.com/ml/cygwin-developers/2009-09/msg00009.html
and
http://cygwin.com/ml/cygwin-developers/2009-09/msg00017.html

My current locally patched DLL doesn't have that problem anymore,
so we're hopefully on the right way.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list