This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ASCII and BINARY files. Why?


Alex Stewart wrote:

> > > POSIX is a flawed standard and always has been.  It is fundamentally
> > > incompatible with the already-established ANSI standard for C programming while
> > > offering no substantial gains in its incompatibility.  For this reason, the
> > > POSIX standard should and must be ignored where such incompatibilities arise as
> > > it is the only sane response to such an assenine flaw.
> >
> > Be careful of who you call asinine.  POSIX *conforms* to ANSI C.
> 
> ANSI C requires that files are opened in text mode by default.

The ANSI C language standard standardized the C language.  Standards
committees for existing ad hoc entities (such as existing programming
languages and operating systems) have a responsibility to cover existing
practice.  The C standard in particular had to have wide applicability.
Since the C language was and is implemented on systems that have a
useful text/binary distinction, such as VMS, and there was existing
technology in the form of the "b" flag, the standards committee
standardized that in the language spec.  In *those implementations that
make such a distinction*, the default mode is text mode.  In those
implementations that do not make such a distinction, the "b" flag is
ignored.  ANSI C does not mandate whether an implementation must make
the distinction.  It certainly does not mandate that implementations
that run on Windows boxes must make the distinction (which makes
Geoff Noer's comment that he is following the ANSI spec rather odd).

> POSIX requires
> that there is no distinction between text and binary files.

POSIX also is a standard for an existing entity, namely ***unix***.
POSIX grew out of the /usr/group standard initiated by Heinz Lycklama,
a former boss of mine, VP at Interactive Systems Corporation,
a unix VAR.  In formulating a standard for unix, the POSIX standards
committee had the responsibility to cover existing practice, which
in unix means that files, even those open via fopen, are byte streams.
Since ANSI C mandates that the default mode for fopen is "text" mode,
but in unix fopen by default opens byte streams, which correspond
to ANSI C's "binary streams", POSIX mandates that there is no
distinction between the two, as ANSI C explicitly allows it to do
(well, technically, it is only *explicit* in a footnote).  Doing
anything else would not have created a *unix* standard, which is what
POSIX is:

	The purpose of this part of ISO/IEC 9945 [POSIX.1 -- jqb] is to
	define a standard operating systems interface and environment
	based upon the UNIX Operating System documentation to support
	application portability at the source level.

To complain that the POSIX standard, by virtue of being what it is
(a standard for unix) is "fatally flawed" is, well, that word you used.
To say that it "offers no substantial gains" is woefully ignorant and
confused.

> These two
> standards can coexist only on underlying systems where there is no distinction
> between file types (such as POSIX OSes).

POSIX and Windows are identical in lacking a distinction between
text and binary files.  The difference is that, because C and unix
were designed together, the mapping from the C newline character to
the unix end-of-line indicator is 1-1, and thus binary and text
streams are equivalent.  This is the crux of the matter, and a point
that people (including myself) often miss or misunderstand in these
discussions.

ANSI C allows implementations to make text/binary distinction or not.
POSIX, an API for systems in which the line terminators in files
are the same as the line terminators in C, naturally does not make
this distinction.  Windows implementations are in a much more
difficult position, because the line terminator in Windows does not
match that of C, *yet there is no file type in Windows*.  Thus,
it is necessary under Windows to know, when a program writes a newline,
whether it is writing a line terminator or just another byte.

> Win32 is not a POSIX OS environment

Which hardly makes POSIX "fatally flawed".
Any more than the Win32 API is "fatally flawed" because it isn't
a POSIX API.

> in that it does distinguish between text and binary file types,

This isn't true.  If it were, GNU-win32 would have less of
a problem.  There is nothing in the Win32 API that allows you to
open files in "text" mode or to mark files as being "text" files.
Windows simply uses a different convention for terminating text lines
than does unix/POSIX, one that also is different from the convention
in C.  That's why cygwin's imposing a text/binary distinction has
so many problems.

> and therefore
> ANSI C requires that files be opened with newline conversions by > default,

No, it does not.  In an ANSI C implementation in which newlines are
converted, such as VC++, you will not see carriage returns upon
reading autoexec.bat.  In an ANSI C implementation in which newlines
are not converted, such as a unix system reading a copy of autoexec.bat
or reading it via a network mount, or a GNU-win32 system with a
filesystem mounted -b, you will see carriage returns upon reading
autoexec.bat.  ANSI C does not mandate which must occur, and thinking
it does is a major misunderstanding (one that Geoff Noer apparently
shares).  Implementations can do newline translation or not, define
a text/binary distinction or not, at their disgression.  All ANSI C
says is that *if* your implementation defines a text/binary distinction,
fopen opens in text mode unless a "b" flag is provided.  But the
distinction is for the ANSI C implementation, and need not reflect the
underlying system.

> however the POSIX C standard requires that they not be.  This is a > fundamental
> incompatibility which renders POSIX inherently _incompatible_ with > ANSI C

No, this is a serious misunderstanding at several different levels.
POSIX is entirely compatible with the ANSI C standard, which allows ANSI
C implementations to impose a text/binary distinction or not.  POSIX
simply mandates that those ANSI C implementations that are POSIX
implementations must not impose such a distinction (of course,
POSIX mandates a bunch of other things unrelated to ANSI C, like
providing an API with specific (unix) semantics).

> (please note here that we are discussing the POSIX API standard, not the larger
> POSIX OS standards.  Such issues in an OS specification would easily be
> dismissed by simply saying "well, Win32 isn't a POSIX OS", however the POSIX
> API specification should be applicable in an ANSI C, non-POSIX OS environment

This is utter nonsense.  The POSIX "OS standard" *is* the POSIX API
standard.  There is no "POSIX OS standard" separate from the API.
What in the world do you think the POSIX API *is*??
All the rest of POSIX has to do with even more specific levels above the
API, such as exactly how sh and cpio and termcap function.  The POSIX
API can be layered on other systems, such as Mach or even Windows NT.

> (as is exactly what people are attempting with GNU-Win32), and the fact that it
> cannot be is still a flaw)

POSIX could be exactly emulated on GNU-Win32.  However, the result
wouldn't be very useful, because it wouldn't coexist at the same level
as "native" Windows programs.  It can be usefully emulated somewhat
closely, but there are many POSIX facilities, such as file protection
modes, effective uids, tty modes, ptys, device abstraction, fork,
file locking, etc. etc. that are missing from Win32 or come in a
radically different form or are done poorly.  I don't know how many
people on this list have an appreciation for just how difficult a job it
is to implement GNU/POSIX under Win32.  The POSIX API is not a matter
line terminators; it is much more than that.

But even if you emulate POSIX somewhat closely, you are left with the
fact that in Windows text lines end with CRLF but in unix text lines
end with LF.  The same would hold true if you tried to emulate
the Win32 API on a unix system.  Programs that do CreateFile and
do their own writing would fail miserably if the emulation magically
transformed CRLF's in the written data to LF's (although they would
fail less often, since CRLF's in binary files are of course less
frequent than are LF's).

> > Perhaps someone around here is an idiot and a moron, but it isn't
> > me or those in charge of the GNU project.  Since GNU programs require
> > many POSIX extensions to ANSI C, such as, say, "stat", it is pointless
> > to try to make GNU programs *strictly* ANSI conforming.  But programs
> > that conform to POSIX already conform (but not strictly) to ANSI C.
> 
> They do not conform to ANSI C if they (for example) fopen a file without a "b"
> flag and expect to read/write binary data from it without problems.  

This is simply false.  "conform" is well defined in the standard.
ANSI C allows implementations that do not distinguish between text and
binary. All POSIX implementations are such implementations, including
the Windows NT POSIX implementation.  Of course, printf("hello world\n")
from a program under that implementation will produce a file that
doesn't contain a CR, but nothing in ANSI C or the Win32 API says it
must.

> Many GNU
> utilities do this and are therefore incompatible with the ANSI C standard
> (strictly or otherwise).

Wrong.

> There is no reason for this as it is possible to
> design application code in such a way that it will function correctly under
> both systems, and therefore any code which does not is flawed and requires a
> bug fix.

Programs that, say, print out the number of links to a file, such
as ls, cannot be written in strictly conforming ANSI C, yet ls is
not "incompatible with the ANSI C standard".  Since a large percentage
of GNU programs fall into this category, trying to convert individual
pieces of GNU code to be strictly conforming to ANSI C, rather than
merely conforming to the POSIX extensions, is pointless.

> If the GNU project will not accept such bug fixes (thus requiring
> their software to be incompatible with ANSI standards) for no reason other than
> "we don't want it that way", then I reiterate my statement that they are
> morons.

I'm sure it gives you a great sense of righteousness when better
informed persons dismiss your ranting, but it won't further your goals.

> > I really don't think that understanding this distinction makes one
> > an idiot or a moron.  I suggest you think twice or more before throwing
> > those words around.
> 
> Understanding the distinction is not the issue.  Requiring incompatibility for
> no reason is the issue, and it is still a valid one.  If you don't like my
> choice of words, fair enough, but it doesn't change the actual issues involved.

You still don't understand the distinction.  Perhaps if you did you
would understand where you have gone wrong. 

> > While strictly conforming ANSI C programs can use fopen(file, "rt"),
> > they cannot use open, O_BIN, O_BINARY, or O_TEXT.  And if they
> > do use "rt", they cannot depend upon its effects and still be strictly
> > conforming.  Since the meaning of none of these is defined by either
> > ANSI C nor POSIX, their use is not portable.
> 
> Under an ANSI-compatible system, "t" is unnecessary as all files are opened in
> text mode by default, therefore "depending on its effect" (its effect being
> that it doesn't need one) is a non-issue.  One would only need to depend on its
> effect within an environment which itself did not conform to these standards
> anyway, and therefore the point becomes moot.

It is moot in a limited sense.  ANSI C says that streams default to text
mode, that conforming implementations must accept all strictly
conforming programs, and that other characters are allowed after the
standard prefixes ("r", "rb", "w+", etc.).  Therefore, a conforming
implementation can only interpret "t" in a way that makes no
difference (and not as "time bomb", as someone suggested).

However, a non-conforming implementation could open streams in
binary mode by default, but open them in text mode if the "t"
flag were present.  Such non-conforming implementations generally
have a way to tell them to be conforming.  e.g., gcc, which is by
default non-conforming, can be told to conform via a command-line
switch.  A system like GNU-win32 could non-conform when mounted
-b (because it makes a text/binary distinction but defaults to binary)
but conform when mounted -t.  That would make the "t" flag useful
in that context.  However, it would not be portable, in the sense
that there might be some other non-conforming implementation that takes
"t" to mean "time bomb".  

--
<J Q B>
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]