This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters


On Jun 25 18:03, Corinna Vinschen wrote:
> On Jun 25 15:38, Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote:
> > > Your locale is zh_CN.UTF-8.  What you're expecting is only guaranteed
> > > in the C locale:
> > 
> > I'm not quite sure it applies here.  I'm using US English Windows 7.
> > 
> > LANG = 'en_US.UTF-8'
> > 
> > I get the same result:
> > 
> > $ echo abcdeABCDE | sed -e 's/[B-D]/_/g'
> > ab__eA___E
> > 
> > BUT:
> > 
> > $ echo abcdeABCDE | LANG=C sed 's/[B-D]/_/g'
> > abcdeA___E
> > 
> > This is very weird, indeed.
> > 
> > OTOH, in Linux I have the same LANG setup, yet it does work
> > correctly:
> > 
> > > echo $LANG
> > en_US.UTF-8
> > > echo abcdeABCDE | sed -e 's/[B-D]/_/g'
> > abcdeA___E
> > 
> > I believe that an en_US UTF-8 string representation for
> > "abcdeABCDE" is not any different from ASCII.
> 
> Wrong.  Try this:
> 
>   $ sort
>   a
>   b
>   c
>   d
>   e
>   A
>   B
>   C
>   D
>   E
>   <Ctrl-D>
>   a
>   A
>   b
>   B
>   c
>   C
>   d
>   D

Which also means, AFAICS, Cygwin's sed is doing it right, Linux' sed
is doing it wrong.  Yes, that puzzles me a bit at the moment, too.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]