This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ASCII and BINARY files. Why?


Fran Litterio wrote:
> 
> Jim Balter wrote:
> 
> >Fran Litterio wrote:
> >> I just tried Notepad under NT 4.0 Workstation, and it works fine on text
> >> files lacking any CRs.
> >
> >Did you check the file size to be *sure* that it does not contain CR's?
> >On my NT 4.0 system, Notepad shows naked linefeeds as solid blocks
> >and does not break the line after them.
> 
> Mea culpa.  I was trusting the output of od -x to show me every byte in
> the text file (even though I wasn't using mount -b).  od -x didn't show
> me any CRs, even though they were there.

I figured as much.  This is exactly the point!  od didn't show you the
CRs because the file was opened in "text" mode and the CR's were
secretly stripped from the file before od ever saw them (not to mention
the fact that nothing past the first ^Z makes it to od, urgh).

> >unix deals with byte streams, and there are many tools for
> >manipulating them, rather than having systems that think
> >they know what they are doing deleting every byte after a ^Z
> >and destroying valuable work.
> 
> Yes.  I am now completely convinced that gnu-win32 should switch to an
> all-binary-all-the-time scheme.  read() should not convert CRNL to NL
> (nor write() do the reverse).  cat should not have implicit knowledge of
> what a ^Z means (i.e., nothing under UNIX).  The gnu-win32 DLL should
> probably even be made recognize a ^D typed on the keyboard (not coming
> down a pipe) to mean end-of-file.

It would be nice if it can be done, but since this only a matter of what
humans type, it does not break anything (other than possibly some
existing documentation) to require people to type ^Z instead of ^D.  Of
course, if it is done, it must be done right; ^D's should *only* be
looked at when coming from a keyboard, nowhere else, and they cause a
read() to return exactly as the ENter key does but without returning the
^D, so that it is possible to enter terminator-less lines from the
keyboard (e.g., abc^D reads as abc with no newline at the end).

> I'm not opposed to activating these
> non-UNIX behaviors conditionally (i.e., via environment variables, mount
> options, filename prefixes, etc.), but the default behavior should be
> all-binary-all-the-time.

Yes, if people are *so* desparate to get "automatic" handling of all
files as text files whether they are or not, thereby breaking everything
else, they should have to do so explicitly.  A big move in that
direction would be to change mount to default to -b and adding a -t
flag.

--
<J Q B>
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]