This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: zsh 4.3.9-1: text-mode stdin problem (breaking base64)


On Wed, 21 Apr 2010, Yutaka Amanai wrote:

2010/04/21 2:12 Peter A. Castro wrote:
Greetings, Yutaka,

Greetings, Peter. Thank you for your reply.

Gettings, again, Yutaka,


The text-mode "hack" was created to solve a basic problem that zsh has
with running scripts, in general, on Windows. Much of the code assumes
that scripts have a single-character line terminator (eg: LF). So do
many text-based programs and filters. Windows "native" line termination
is (still) CRLF and zsh code does not deal well with this.

Cygwin's text-mode munches the CR from the stream input leaving the LF
which works well in 99% of the usage cases. Without it, Zsh treats the
CR as part of the input line and tries to parse it as such leading to
"Bad Things"(tm) happening. The same think would be true of data read
via the shell and passed to other programs as stdin. There's also some
size calculations that only work with a single-character line terminator
(at least in zsh code).

Could you give me a simple test case that fails without cygwin_premain0()? I set my filesystems as text-mode and tried to find such cases, but I couldn't.

It's been a while since I've looked at this, but the problem was mostly with binary-mode mounts, not text-mode mounts. The problem was that, say, you had your root mounted as text-mode, but your /tmp mounted as binary-mode. Zsh (and other utilities) create temp files fairly often and feed those as input to itself or other programs. Or, reverse the case (root mounted binary and /tmp mounted text).

{f}open() in Cygwin is context sensitive to the filesystem mount mode.
This leads to such situations as calling fopen("/tmp/foo","r") and
expecting it to read "text" lines, but "/tmp" is mounted binary and file
"foo" contains CRLF's because it was created by a Windows program or
editor.  So, when you read the lines you will get the CR as well as the
LF, when you really only want the LF.  Where as if "/tmp" was mounted
text, the CR would be stripped off as part of text processing.

I thought about two cases:
* If you don't use CRLF scripts at all and mount all your filesystems as
 binary-mode, there should be no problem (without premain hack).

In a pure Cygwin eco-system that might work. However, many Cygwin users have to interact with non-Cygwin created data and files. If you ask the good users on this mailing list you might find that people have any combination of file systems mounted for their particluar needs.

* If you use CRLF scripts and mount all your filesystem as text-mode,
 there should be no problem (without premain hack).

But, now, you won't get binary data from the files using a naked "open()" as so many typically coded apps do.

Is it right?

If you could keep things strictly black-and-white like that, yes, in theory these could work. Well, the first one would be preferable as opposed to the second one. But the problem is that most Cygwin users don't operate in such a strict environment.

A while back I looked at making changes to somehow acommodate CRLF, but
there are many places in the code that would require some heavy changes
(some of which I'm still not certain would be correct) and would make it
difficult to maintain. I doubt that Zsh base would accept such changes
either as they would be an intrusive hack for Windows only support. By
contrast the premain hack was elegent and global.

I could have simply told people that they had to run scripts from a
non-text-mode mount, that their /tmp had to also be on a non-text-mode
mount and all data the scripts explicitly read from were also on a
non-text-mode mount AND all scripts (and input data) must be non-CRLF.
Think that would fly? Me neither. That was the basis for this "fix" in
the first place.

I don't know well about zsh code, but I think it will be hard to do the hack without cygwin_premain0(), as you said. But, how about bash? bash seems not to have such hacks, but it seems to work well. And I think it's confusing that bash and zsh treat stdin as different mode.

Have a look at Bash code some time. I recall seeing some O_TEXT options being set in the various {f}open()'s that it does. Again, I looked at doing the same in Zsh code, but after some initial experiments it proved that there were too many dependencies and assumptions about the carriage-control of "text" files to make it work quickly.

And how is base64's deficiency a zsh problem? Stdin/Stdout are "text"
handles, which implies possible data manipulation along those lines.
There's no guarantee that they would pass binary data.

I believe that programs reading from stdin are supposed to assume the
text-mode semantic for the handles and behave accordingly. You've
mentioned "cat" and "gzip" doing that very thing. Think there might be a
reason for that?

Indeed, it's theoretically right that any programs which perform binary I/O should set stdin/stdout as binary mode for portability. But practically, it will be a heavy work to check that all programs on our system follow the rule, and I think the check can't be perfect. I'd

Reguardless of how much work it might be, it's a matter of "due diligence". When you find something that doesn't behave appropriately, report it to the maintainers.

And, in that vein, yes, I acknowledge there are issues with Zsh in this
area.  The premain is one "solution" that works for most cases.  You
appear to have found one case that doesn't work as expected
(congratulations!).  But, as I said, that particular case appears to be
more a matter of that the Stdin handle should be treated as and work
appropriately.

This problem is still under consideration.  Having more than one type of
filesystem mode is part of the equasion and attempting to treat that
correctly is somewhat difficult in Zsh.

rather keep all my scripts as LF than break my data by some programs
like base64, so I will continue to use the customized zsh.

If that works for you, great. That's why the source is available. I do hope to get back to this issue at some point. Thanks for pointing it out.

PS: for base64, I will report the problem to bug-coreutils list later.

Excellent. I have a good understanding of what you are testing so I can have another look at show Zsh does it's file handles and maybe code a proper fix.

--
Peter A. Castro <doctor@fruitbat.org> or <Peter.Castro@oracle.com>
	"Cats are just autistic Dogs" -- Dr. Tony Attwood

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]