This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: b19 file I/O


David O'Riva and I have been debating about the best way to
solve the network i/o performance issue...  Below are some
relevant snippets, forwarded for comment by people more in
the know.

-- 
Jacob Langseth <jlangseth@esisys.com>

 From:	David O'Riva <oriva@agames.com>
 Sent:	Friday, March 27, 1998 4:47 PM
 To:	Jacob Langseth <jlangseth@esisys.com>

[snippage]

Well, I don't know how B18 and B19 differ... The only difference between
them may be in the BLKSIZE returned by stat().  As near as I can tell,
read() is implemented as a 10-line stub around the NT read call (which works
really well when given a large request, and blows when given a sequence of
small requests), whereas fread() goes through some really vicious C++ code
and comes out as a series of BLKSIZE read()s.  Now, gzip and the fileutils
actually use the open()/read() calls, but they use the stat() call to figure
out how large a chunk to read() at a time!

It looks to me like an architectural issue.  Unix systems are usually pretty
good about reporting useful ideal chunksizes for the various attached
hardware, whereas DOS/Windows has never had particularly good device driver
architecture or implementation.  The Windows file calls are so isolated from
the actual hardware that there may not even BE a useful chunksize for them...

Note that this is an entirely different problem from your original untar.gz
issue - that one is probably explained by different handling of the
permissions in B19 (i.e. maybe B19 is more aggressive about storing
permissions for every file it creates? - dunno).


 From:	David O'Riva <oriva@agames.com>
 Sent:	Friday, March 27, 1998 5:57 PM
 To:	Jacob Langseth <jlangseth@esisys.com>

At 05:33 PM 3/27/98 -0500, you wrote:
>hmmm...
>
>At first I didn't think the problems were seperate at all.  If b18 were
>keeping a decent sized i/o buffer handy internally for read(), write(),
>fread() and fwrite(), BLKSIZE wouldn't really matter, but if b19 changed
>this behavior, it would seem to explain the performance difference.  b19
>was a complete rewrite, after all.
>
>But, thinking back, I remember doing a find on a network drive under b18
>and it *crawled*.  I mean we cut-and-pasted the filenames in notepad
>and appended "copy " and were done *WAY* before the find-equivalent
>completed, which seems to support your theory in both that blksize on
>network i/o bites and that my b18 vs. b19 performance problem is a
>seperate issue.
>
>So, now the question is, what to do about the i/o problem?  Ideally 
>something can be done in the dll so that programs don't have to be
>recompiled to use a bighonkinbuffer rather than the blocksize returned
>by stat().  I see two ways for this:
>    1) have stat() and fstat() lie about the block size and return
>	bighonkinbuf instead.  (I think this would probably be *BAD*,
>	for reasons I can't think of.)

If my (admittedly not in depth) perusal of the sources is accurate, one
could probably just hunt down the actual definition of BLKSIZE (or BSIZE, or
_ST_BLKSIZE, or whatever it's actually called), and multiply it by 10, then
rebuild the DLL and the *utils packages.  This could give a significant
performance boost...

Upping the "assumed" blocksize from 1K to 10K _probably_ wouldn't be too
bad, as long as the change was truly propagated throughout the system.
Since the DLL is sort of a virtualization layer anyway, it's not going to
cause the wierd, bad or offset mapping that it could on a real unix system
(I think).

>    2) if the dll detects that an i/o operation is happening on a file,
>	be it read(), write(), fread(), fwrite() or whatever, it should
>	allocate an internal buffer of size bighonkinbuf, and hand
>	it out to the client in blksize chunks.  This would take some
>	sleight of hand, but probably wouldn't be too bad.  (The OS
>	should already do this for fread() and fwrite(), if I remember
>	my K&R correctly, so it might only be necessary for read()
>	and write().)
>

fread degenerates into read() calls, so you would only have to "fix" read()
and write().

>Comments?

It should be possible to find out from NT whether a file is networked or
not.  While one would *assume* that the cygwin32.dll is checking for this
along with all the other checks it does whenever you so much as look
crosseyed at a file, that may not be the case.  If that information is
available, then maybe the only necessary fix would be to jack up the blksize
returned by stat() for those particular files.

The tools that I looked at (cp in particular) didn't seem to make very many
assumptions about what stat() would return - they just blithely malloc()ed
whatever the blocksize was and dealt with it...

-dave                                         ________  _______ ____
---------------------------------------------/_ \     \/|  ___ \    \- -
David O'Riva - Staff Programmer                /|  |\  \|  ____/ /\  \
   oriva@agames.com                            /|  |/  /|  |\/\  \/  /\
                                              __|_____o/|__o\ /\____o\ 
                                          ---/________\/____\--/____\-- -
DISCLAIMER: Any opinions expressed here are mine, *not* my employer's.

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]